Astrostatistics school

What a wonderful week at the Astrostat [Indian] summer school in Autrans! The setting was superb, on the high Vercors plateau overlooking both Grenoble [north] and Valence [west], with the colours of the Fall at their brightest on the foliage of the forests rising on both sides of the valley and a perfect green on the fields at the centre, with sun all along, sharp mornings and warm afternoons worthy of a late Indian summer, too many running trails [turning into X country ski trails in the Winter] to contemplate for a single week [even with three hours of running over two days], many climbing sites on the numerous chalk cliffs all around [but a single afternoon for that, more later in another post!]. And of course a group of participants eager to learn about Bayesian methodology and computational algorithms,…
Original Post: Astrostatistics school

Data from Public Bicycle Hire Systems

A new rOpenSci package provides access to data to which users may already have directly contributed, and for which contribution is fun, keeps you fit, and helps made the world a better place. The data come from using public bicycle hire schemes, and the package is called bikedata. Public bicycle hire systems operate in many cities throughout the world, and most systems collect (generally anonymous) data, minimally consisting of the times and locations at which every single bicycle trip starts and ends. The bikedata package provides access to data from all cities which openly publish these data, currently including London, U.K., and in the U.S.A., New York, Los Angeles, Philadelphia, Chicago, Boston, and Washington DC. The package will expand as more cities openly publish their data (with the newly enormously expanded San Francisco system next on the list). The short…
Original Post: Data from Public Bicycle Hire Systems

Data acquisition in R (1/4)

R is an incredible tool for reproducible research. In the present series of blog posts I want to show how one can easily acquire data within an R session, documenting every step in a fully reproducible way. There are numerous data acquisition options for R users. Of course, I do not attempt to show all the data possibilities and tend to focus mostly on demographic data. If your prime interest lies outside human population statistics, it’s worth checking the amazing Open Data Task View. The series consists of four posts: Loading prepared datasets Accessing popular statistical databases Demographic data sources Getting spatial data For each of the data acquisition options I provide a small visualization use case. For illustration purposes, many R packages include data samples. Base R comes with a datasets package that offers a wide range of simple,…
Original Post: Data acquisition in R (1/4)

colourpicker package v1.0: You can now select semi-transparent colours in R (& more!)

For those who aren’t familiar with the colourpicker package, it provides a colour picker for R that can be used in Shiny, as well as other related tools. Today it’s leaving behind its 0.x days and moving on to version 1.0! colourpicker has gone through a few major milestones since its inception. It began as merely a colour selector input in an unrelated package (shinyjs), simply because I didn’t think a colour picker input warrants its own package. After gaining a gadget and an RStudio addin (as well as some popularity!), it graduated to become its own package. Earlier this year, the Plot Helper tool was added. And now colourpicker is taking its next big step – an upgrade to version 1.0. Due credit Before describing the amazing new features, I have to give credit to David Griswold who made…
Original Post: colourpicker package v1.0: You can now select semi-transparent colours in R (& more!)

My interview with ROpenSci

The ROpenSci team has started publishing a new series of interviews with the goal of “demystifying the creative and development processes of R community members”. I had the great pleasure of being interviewed by Kelly O’Briant earlier this year, and the interview was published on Friday. Thanks for being a great interviewer, Kelly! I’m looking forward to hearing from other R community members as the the rest of the series is published. ROpenSci blog: .rprofile: David Smith Related To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got…
Original Post: My interview with ROpenSci

A Newbie’s Install of Keras & Tensorflow on Windows 10 with R

This weekend, I decided it was time: I was going to update my Python environment and get Keras and Tensorflow installed so I could start doing tutorials (particularly for deep learning) using R. Although I used to be a systems administrator (about 20 years ago), I don’t do much installing or configuring so I guess that’s why I’ve put this task off for so long. And it wasn’t unwarranted: it took me the whole weekend to get the install working. Here are the steps I used to get things running on Windows 10, leveraging clues in about 15 different online resources — and yes (I found out the hard way), the order of operations is very important. I do not claim to have nailed the order of operations here, but definitely one that works. Step 0: I had already installed the tensorflow and keras…
Original Post: A Newbie’s Install of Keras & Tensorflow on Windows 10 with R

Markets Performance after Election

Coming back to markets and trading (after a while), the feeling has been that the markets, and the economy as a whole, are doing good. How good? Since I haven’t been following things closely, I had to do some forensics. Friday was the 234th trading day after the election. There were claims at different points in time that the market is the best in recent history. Let’s take a look at the top five presidential rallies since election: As of Friday, President Trump’s rally is standing at 24.76%. That qualifies it as the 3rd best in more modern history (I will come back to this later). At least on the Dow Jones Industrial Average. It is easy to see that on two occasions, in the middle of the chart, as well as around first 30 days, President Trump’s rally was…
Original Post: Markets Performance after Election

Citibike Business Opportunity: Advertising

Introduction It is hard to wander around New York City without seeing rows of dozens of bright blue Citibikes planted in the middle of busiest nooks and crannies of the city.  These bikes belong to Citibike, a ride-sharing program that allows users to conveniently rent a bike to travel to their destinations without having to worry about the hassles of parking and locking their bicycle.  Citibike has quickly become the preferred mode of transportation for many New Yorkers who are tired of the laundry list of issues with public transportation and are looking to get some fresh air as they travel around the city.    The premise for the program is quite simple, you can choose between an annual pass for year round access or a 3 or 7 day pass as a more temporary option.  Pass holders are able…
Original Post: Citibike Business Opportunity: Advertising

Why Use Docker with R? A DevOps Perspective

There have been several blog posts going around about why one would use Docker with R.In this post I’ll try to add a DevOps point of view and explain how containerizingR is used in the context of the OpenCPU system for building and deploying R servers. Has anyone in the #rstats world written really well about the why of their use of Docker, as opposed to the the how? — Jenny Bryan (@JennyBryan) September 29, 2017 1: Easy Development The flagship of the OpenCPU system is the OpenCPU server:a mature and powerful Linux stack for embedding R in systems and applications.Because OpenCPU is completely open source we can build and ship on DockerHub. A ready-to-go linux server with both OpenCPU and RStudiocan be started using the following (use port 8004 or 80): docker run -t -p 8004:8004 opencpu/rstudio Now simply…
Original Post: Why Use Docker with R? A DevOps Perspective

Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders

Sales, customer service, supply chain and logistics, manufacturing… no matter which department you’re in, you more than likely care about backorders. Backorders are products that are temporarily out of stock, but a customer is permitted to place an order against future inventory. Back orders are both good and bad: Strong demand can drive back orders, but so can suboptimal planning. The problem is when a product is not immediately available, customers may not have the luxury or patience to wait. This translates into lost sales and low customer satisfaction. The good news is that machine learning (ML) can be used to identify products at risk of backorders. In this article we use the new H2O automated ML algorithm to implement Kaggle-quality predictions on the Kaggle dataset, “Can You Predict Product Backorders?”. This is an advanced tutorial, which can be difficult…
Original Post: Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders