The music of Les Mills Body Pump, with Spotify data

I am a runner but also a Body Pump enthusiast. Body Pump is a group fitness class of the Les Mills company, in which you train different muscle groups using a weighted bar – whose total weight you modulate with plates in order to adapt it to your fitness level and to the muscle group. Like R, Body Pump was created in New Zealand, what a wonderful country! Every three months, a new class is released, with new songs and choreographies. What doesn’t change is the muscle group trained in each of the 10 songs of each class. I’ve thought of analysing Body Pump data for a long time now but could never find what I was looking for, which was a dataset of number of “reps” by song, e.g. how many squats do you do in each squats…
Original Post: The music of Les Mills Body Pump, with Spotify data

Identify & Analyze Web Site Tech Stacks With rappalyzer

Modern websites are complex beasts. They house photo galleries, interactive visualizations, web fonts, analytics code and other diverse types of content. Despite the potential for diversity, many web sites share similar “tech stacks” — the components that come together to make them what they are. These stacks consist of web servers (often with special capabilities), cache managers and a wide array of front-end web components. Unless a site goes to great lengths to cloak what they are using, most of these stack components leave a fingerprint — bits and pieces that we can piece together to identify them. Wappalyzer is one tool that we can use to take these fingerprints and match them against a database of known components. If you’re not familiar with that service, go there now and enter in the URL of your own blog or…
Original Post: Identify & Analyze Web Site Tech Stacks With rappalyzer

Data.Table by Example – Part 3

For this final post, I will cover some advanced topics and discuss how to use data tables within user generated functions. Once again, let’s use the Chicago crime data. dat = fread(“rows.csv”) names(dat) Let’s start by subseting the data. The following code takes the first 50000 rows within the dat dataset, selects four columns, creates three new columns pertaining to the data, and then removes the original date column. The output was saved as to new variable and the user can see the first few columns of the new data table using brackets or head function. ddat = dat[1:50000, .(Date, value1, value2, value3)][, c(“year”, “month”, “day”) := .(year(mdy_hms(Date)), month(mdy_hms(Date)), day(mdy_hms(Date)))][,-c(“Date”)] ddat[1:3] # same as head(ddat, 3) We can now do some intermediate calculations and suppress their output by using braces. unique(ddat$month) ddat[, { avg_val1 =…
Original Post: Data.Table by Example – Part 3

tidytext 0.1.4

I am pleased to announce that tidytext 0.1.4 is now on CRAN! This release of our package for text mining using tidy data principles has an excellent collection of delightfulness in it. First off, all the important functions in tidytext now support support non-standard evaluation through the tidyeval framework. library(janeaustenr) library(tidytext) library(dplyr) input_var % unnest_tokens(!! output_var, !! input_var) ## # A tibble: 122,204 x 1 ## word ## ## 1 pride ## 2 and ## 3 prejudice ## 4 by ## 5 jane ## 6 austen ## 7 chapter ## 8 1 ## 9 it ## 10 is ## # … with 122,194 more rows I have found the tidyeval framework useful already in my day job when writing functions using dplyr for complex data analysis tasks, so we are glad to have this support in tidytext. The older…
Original Post: tidytext 0.1.4

The Kelly Criterion — Does It Work?

This post will be about implementing and investigating the running Kelly Criterion — that is, a constantly adjusted Kelly Criterion that changes as a strategy realizes returns. For those not familiar with the Kelly Criterion, it’s the idea of adjusting a bet size to maximize a strategy’s long term growth rate. Both and Investopedia have entries on the Kelly Criterion. Essentially, it’s about maximizing your long-run expectation of a betting system, by sizing bets higher when the edge is higher, and vice versa. There are two formulations for the Kelly criterion: the Wikipedia result presents it as mean over sigma squared. The Investopedia definition is P-[(1-P)/winLossRatio], where P is the probability of a winning bet, and the winLossRatio is the average win over the average loss. In any case, here are the two implementations. investoPediaKelly 0] 0] Let’s…
Original Post: The Kelly Criterion — Does It Work?

Because it's Friday: Big Little Songs

I really enjoyed the HBO Series Big Little Lies: fantastic story, writing and acting, and while I do wish it had been set in the original Sydney, the transplanted Monterey location was admittedly a great fit. The series also has an amazing soundtrack, including an eminently re-listenable title track. In fact, one of the best things about the series was being introduced to the music of Michael Kiwanuka. It’s a unique mix of Philadelphia soul, heavy on the choir, and with an infusion of disco strings. The title track from the album, Love and Hate, is on heavy rotation in my playlists now. That’s all from us here at the blog for this week. Have a great weekend, and we’ll be back on Monday.
Original Post: Because it's Friday: Big Little Songs

R 3.4.2 is released (with several bug fixes and a few performance improvements)

R 3.4.2 (codename “Short Summer”) was released yesterday. You can get the latest binaries version from here. (or the .tar.gz source code from here). As mentioned by David Smith, R 3.4.2 includes a performance improvement for names: c() and unlist() are now more efficient in constructing the names(.) of their return value, thanks to a proposal by Suharto Anggono. (PR#17284) The full list of bug fixes and new features is provided below. Thank you Duncan Murdoch ! On a related note, following the announcement on R 3.4.2, Duncan Murdoch wrote yesterday: I’ve just finished the Windows build of R 3.4.2.  It will make it to CRAN and its mirrors over the next few hours. This is the last binary release that I will be producing.  I’ve been building them for about 15 years, and it’s time to retire.  Builds using different tools and scripts are available from  I’ll be putting my…
Original Post: R 3.4.2 is released (with several bug fixes and a few performance improvements)

Top 10 Videos on Machine Learning in Finance

[unable to retrieve full-text content]Talks, tutorials and playlists – you could not get a more gentle introduction to Machine Learning (ML) in Finance. Got a quick 4 minutes or ready to study for hours on end? These videos cover all skill levels and time constraints!
Original Post: Top 10 Videos on Machine Learning in Finance