Solver Interfaces in CVXR

Introduction In our previous blogpost, weintroduced CVXR, an R package for disciplined convexoptimization. The package allows one to describe an optimizationproblem with Disciplined Convex Programmingrules using high level mathematical syntax. Passing this problemdefinition along (with a list of constraints, if any) to the solvefunction transforms it into a form that can be handed off toa solver. The default installation of CVXR comes with two (imported)open source solvers: ECOS and its mixed integercousin ECOS_BB via the CRAN packageECOSolveR SCS via the CRAN packagescs. CVXR (version 0.99) can also make use of several other open sourcesolvers implemented in R packages: The linear and mixed integer programming packagelpSolve via thelpSolveAPI package The linear and mixed integer programming package GLPK via theRglpk package. About Solvers The real work of finding a solution is done by solvers, and writinggood solvers is hard work. Furthermore, some…
Original Post: Solver Interfaces in CVXR

A First Look at NIMBLE

Writing a domain-specific language (DSL) is a powerful and fairly common method for extending the R language. Both ggplot2 and dplyr, for example, are DSLs. (See Hadley’s chapter in Advanced R for some elaboration.) In this post, I take a first look at NIMBLE (Numerical Inference for Statistical Models using Bayesian and Likelihood Estimation), a DSL for formulating and efficiently solving statistical models in general, and Bayesian hierarchical models in particular. The latter comprise a class of interpretable statistical models useful for both inference and prediction. (See Gelman’s 2006 Technographics paper for what these models can and cannot do.)Most of what I describe here can be found in the comprehensive and the very readable paper by Valpine et al., or the extensive NIMBLE User Manual.At the risk of oversimplification, it seems to me that the essence of the NIMBLE is…
Original Post: A First Look at NIMBLE

May 2018: “Top 40” New Packages

While looking over the 215 or so new packages that made it to CRAN in May, I was delighted to find several packages devoted to subjects a little bit out of the ordinary; for instance, bioacoustics analyzes audio recordings, freegroup looks at some abstract mathematics, RQEntangle computes quantum entanglement, stemmatology analyzes textual musical traditions, and treedater estimates clock rates for evolutionary models. I take this as evidence that R is expanding beyond its traditional strongholds of statistics and finance as people in other fields with serious analytic and computational requirements become familiar with the language. And, when I see a package from a philologist and scholar of “Ancient and Medieval Worlds”, I am persuaded to think that R is making a unique contribution to computational literacy. Below are my “Top 40” package picks for May 2018, organized into the following…
Original Post: May 2018: “Top 40” New Packages

Reading and analysing log files in the RRD database format

I have frequent conversations with R champions and Systems Administrators responsible for R, in which they ask how they can measure and analyze the usage of their servers. Among the many solutions to this problem, one of the my favourites is to use an RRD database and RRDtool. From Wikipedia: RRDtool (round-robin database tool) aims to handle time series data such as network bandwidth, temperatures or CPU load. The data is stored in a circular buffer based database, thus the system storage footprint remains constant over time. RRDtool is a library written in C, with implementations that can also be accessed from the Linux command line. This makes it convenient for system development, but makes it difficult for R users to extract and analyze this data. I am pleased to announce that I’ve been working on the rrd R package…
Original Post: Reading and analysing log files in the RRD database format

Player Data for the 2018 FIFA World Cup

Official PDF FIFA has made several official player lists available, conveniently changing the format each time. For this exercise, I use the one from early June. The tabulizer package makes extracting information from tables included in a PDF document relatively easy. (The other (later) version of the official PDF is here. Strangely, the weight variable has been dropped.) suppressMessages(library(tidyverse)) library(stringr) suppressMessages(library(lubridate)) suppressMessages(library(cowplot)) # Note that I set warnings to FALSE because of some annoying (and intermittent) # issues with RJavaTools. library(tabulizer) url <- “https://github.com/davidkane9/wc18/raw/master/fifa_player_list_1.pdf” out <- extract_tables(url, output = “data.frame”) We now have a 32 element list, each item a data frame of information about the 23 players on each team. Let’s combine this information into a single tidy tibble. # Note how bind_rows() makes it very easy to combine a list of compatible # dataframes. pdf_data <- bind_rows(out) %>%…
Original Post: Player Data for the 2018 FIFA World Cup

Monte Carlo Part Two

In a previous post, we reviewed how to set up and run a Monte Carlo (MC) simulation of future portfolio returns and growth of a dollar. Today, we will run that simulation many, many, times and then visualize the results. Our ultimate goal is to build a Shiny app that allows an end user to build a custom portfolio, simulate returns, and visualize the results. If you just can’t wait, a link to the final Shiny app is available here. This post builds off the work we did previously. I won’t go through the logic again, but the code for building a portfolio, calculating returns, mean and standard deviation of returns, and using them for a simulation is here: library(tidyquant) library(tidyverse) library(timetk) library(broom) library(highcharter) symbols <- c(“SPY”,”EFA”, “IJS”, “EEM”,”AGG”) prices <- getSymbols(symbols, src = ‘yahoo’, from = “2012-12-31”, to =…
Original Post: Monte Carlo Part Two

Exploring R Packages with cranly

In a previous post, I showed a very simple example of using the R function tools::CRAN_package_db() to analyze information about CRAN packages. CRAN_package_db() extracts the metadata CRAN stores on all of its 12,000 plus packages and arranges it into a “database”, actually a complicated data frame in which some columns have vectors or lists as entries.It’s simple to run the function and it doesn’t take very long on my Mac Book Air.The following gives some insight into what’s contained in the data frame.Looking at a few rows and columns gives a feel for how complicated its structure is.So, having spent a little time leaning how vexing working with this data can be, I was delighted when I discovered Ioannis Kosmidis’ cranly package during my March “Top 40” review. cranly is a very impressive package, built along tidy principles, that is…
Original Post: Exploring R Packages with cranly

April 2018: “Top 40” New Packages

Below are my “Top 40” picks from the approximately 212 new packages that made it to CRAN in April. They are organized into ten categories: Computational Methods, Data, Data Science, Machine Learning, Music, Science, Statistics, Time Series, Utilities, and Visualizations. Computational Methods diffeqr v0.1.1: Provides an interface to DifferentialEquations.jl which offers high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE), delay differential equations (DDE), differential-algebraic equations (DAE), and more. There are vignettes for solving differential algebraic equations, delay differential equations, ordinary differential equations, and stochastic differential equations. SimRepeat v0.1.0: Provides functions to simulate correlated systems of statistical equations with multiple variable types. There is a vignette describing the underlying theory, as well as vignettes on systems with Continuous variables, multiple variable types, and hierarchical linear models. Data echor v0.1.0: Implements an interface to the United States…
Original Post: April 2018: “Top 40” New Packages

Enterprise Dashboards with R Markdown

This is a second post in a series on enterprise dashboards. See our previous post, Enterprise-ready dashboards with Shiny Databases. We have been living with spreadsheets for so long that most office workers think it is obvious that spreadsheets generated with programs like Microsoft Excel make it easy to understand data and communicate insights. Everyone in a business, from the newest intern to the CEO, has had some experience with spreadsheets. But using Excel as the de facto analytic standard is problematic. Relying exclusively on Excel produces environments where it is almost impossible to organize and maintain efficient operational workflows. In addition to fostering low productivity, organizations risk profits and reputations in an age where insightful analyses and process control translate to a competitive advantage. Most organizations want better control over accessing, distributing, and processing data. You can use the…
Original Post: Enterprise Dashboards with R Markdown

2018 R Conferences

rstudio::conf 2018 and the New York R Conference are both behind us, but we are rushing headlong into the season for conferences focused on the R Language and its applications. The European R Users Meeting (eRum) begins this coming Monday, May 14th, in Budapest with three days of workshops and talks. Headlined by R Core member Martin Mächler and fellow keynote speakers Achim Zeileis, Nathalie Villa-Vialaneix, Stefano Maria Iacus, and Roger Bivand, the program features an outstanding array of accomplished speakers including RStudio’s own Barbara Borges Ribeiro, Andrie de Vries, and Lionel Henry. Second only to useR! in longevity, the tenth consecutive R / Finance will be held in Chicago on June 1st and 2nd. Keynote speakers Norm Matloff, J.J. Allaire, and Li Deng head a strong program. Produced by the same committed crew of Chicago quants with the unwavering…
Original Post: 2018 R Conferences