Le Monde puzzle [#1053]

An easy arithmetic Le Monde mathematical puzzle again: If coins come in units of 1, x, and y, what is the optimal value of (x,y) that minimises the number of coins representing an arbitrary price between 1 and 149?  If the number of units is now four, what is the optimal choice? The first question is fairly easy to code coinz <- function(x,y){ z=(1:149) if (y and returns M=12 as the maximal number of coins, corresponding to x=4 and y=22. And a price tag of 129.  For the second question, one unit is necessarily 1 (!) and there is just an extra loop to the above, which returns M=8, with other units taking several possible values: [1] 40 11 3 [1] 41 11 3 [1] 55 15 4 [1] 56 15 4 A quick search revealed that this problem (or…
Original Post: Le Monde puzzle [#1053]

PYPL Language Rankings: Python ranks #1, R at #7 in popularity

The new PYPL Popularity of Programming Languages (June 2018) index ranks Python at #1 and R at #7. Like the similar TIOBE language index, the PYPL index uses Google search activity to rank language popularity. PYPL, however, fcouses on people searching for tutorials in the respective languages as a proxy for popularity. By that measure, Python has always been more popular than R (as you’d expect from a more general-purpose language), but both have been growing at similar rates. The chart below includes the three data-oriented languages tracked by the index (and note the vertical scale is logarithmic). Another language ranking was also released recently: the annual KDnuggets Analytics, Data Science and Machine Learning Poll. These rankings, however, are derived not from search trends but by self-selected poll respondents, which perhaps explains the presence of Rapidminer at the #2 spot. Related…
Original Post: PYPL Language Rankings: Python ranks #1, R at #7 in popularity

PYPL Language Rankings: Python ranks #1, R at #7 in popularity

The new PYPL Popularity of Programming Languages (June 2018) index ranks Python at #1 and R at #7. Like the similar TIOBE language index, the PYPL index uses Google search activity to rank language popularity. PYPL, however, fcouses on people searching for tutorials in the respective languages as a proxy for popularity. By that measure, Python has always been more popular than R (as you’d expect from a more general-purpose language), but both have been growing at similar rates. The chart below includes the three data-oriented languages tracked by the index (and note the vertical scale is logarithmic). Another language ranking was also released recently: the annual KDnuggets Analytics, Data Science and Machine Learning Poll. These rankings, however, are derived not from search trends but by self-selected poll respondents, which perhaps explains the presence of Rapidminer at the #2 spot.
Original Post: PYPL Language Rankings: Python ranks #1, R at #7 in popularity

Big News: vtreat 1.2.0 is Available on CRAN, and it is now Big Data Capable

We here at Win-Vector LLC have some really big news we would please like the R-community’s help sharing. vtreat version 1.2.0 is now available on CRAN, and this version of vtreat can now implement its data cleaning and preparation steps on databases and big data systems such as Apache Spark. vtreat is a very complete and rigorous tool for preparing messy real world data for supervised machine-learning tasks. It implements a technique we call “safe y-aware processing” using cross-validation or stacking techniques. It is very easy to use: you show it some data and it designs a data transform for you. Thanks to the rquery package, this data preparation transform can now be directly applied to databases, or big data systems such as PostgreSQL, Amazon RedShift, Apache Spark, or Google BigQuery. Or, thanks to the data.table and rqdatatable packages, even…
Original Post: Big News: vtreat 1.2.0 is Available on CRAN, and it is now Big Data Capable

Neural Networks Are Essentially Polynomial Regression

You may be interested in my new arXiv paper, joint work with Xi Cheng, an undergraduate at UC Davis (now heading to Cornell for grad school); Bohdan Khomtchouk, a post doc in biology at Stanford; and Pete Mohanty,  a Science, Engineering & Education Fellow in statistics at Stanford. The paper is of a provocative nature, and we welcome feedback. A summary of the paper is: We present a very simple, informal mathematical argument that neural networks (NNs) are in essence polynomial regression (PR). We refer to this as NNAEPR. NNAEPR implies that we can use our knowledge of the “old-fashioned” method of PR to gain insight into how NNs — widely viewed somewhat warily as a “black box” — work inside. One such insight is that the outputs of an NN layer will be prone to multicollinearity, with the problem…
Original Post: Neural Networks Are Essentially Polynomial Regression

Intro To Time Series Analysis Part 2 :Exercises

In the exercises below, we will explore more in Time Series analysis.The previous exercise is here,Please follow this in sequenceAnswers to these exercises are available here. Exercise 1 load the AirPassangers data,check its class and see the start and end of the series . Exercise 2check the cycle of the TimeSeries AirPassangers . Exercise 3 create the lagplot using the gglagplot from the forecast package,check how the relationship changes as the lag increases Exercise 4 Also plot the correlation for each of the lags , you can see when the lag is above 6 the correlation drops and again climbs up in 12 and again drops in 18 .Exercise 5 Plot the histogram of the AirPassengers using gghistogram from forecast Exercise 6 Use tsdisplay to plot autocorrelation , timeseries and partial autocorrelation together in a same plot Exercise 7 Find…
Original Post: Intro To Time Series Analysis Part 2 :Exercises

a chain of collapses

A quick riddler resolution during a committee meeting (!) of a short riddle: 36 houses stand in a row and collapse at times t=1,2,..,36. In addition, once a house collapses, the neighbours if still standing collapse at the next time unit. What are the shortest and longest lifespans of this row? Since a house with index i would collapse on its own by time i, the longest lifespan is 36, which can be achieved with the extra rule when the collapsing times are perfectly ordered. For the shortest lifespan, I ran a short R code implementing the rules and monitoring its minimum. Which found 7 as the minimal number for 10⁵ draws. However, with an optimal ordering, one house plus one or two neighbours of the most recently collapsed, leading to a maximal number of collapsed houses after k time…
Original Post: a chain of collapses

Searching For Unicorns (And Other NBA Myths)

A visual exploration of the 2017-2018 NBA landscape The modern NBA landscape is rapidly changing. Steph Curry has redefined the lead guard prototype with jaw-dropping shooting range coupled with unprecedented scoring efficiency for a guard. The likes of Marc Gasol, Al Horford and Kristaps Porzingis are paving the way for a younger generation of modern big men as defensive rim protectors who can space the floor on offense as three-point threats. Then there are the new-wave facilitators – LeBron James, Draymond Green, Ben Simmons – enormous athletes who can guard any position on defense and push the ball down court in transition. For fans, analysts and NBA front offices alike, these are the prototypical players that make our mouths water. So what do they have in common? For one, they are elite statistical outliers in at least two categories, and…
Original Post: Searching For Unicorns (And Other NBA Myths)