You have created your first Linear Regression Model. Have you validated the assumptions?

[unable to retrieve full-text content]Linear Regression is an excellent starting point for Machine Learning, but it is a common mistake to focus just on the p-values and R-Squared values while determining validity of model. Here we examine the underlying assumptions of a Linear Regression, which need to be validated before applying the model.
Original Post: You have created your first Linear Regression Model. Have you validated the assumptions?

The 10 Statistical Techniques Data Scientists Need to Master

[unable to retrieve full-text content]The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.
Original Post: The 10 Statistical Techniques Data Scientists Need to Master

How Bayesian Networks Are Superior in Understanding Effects of Variables

[unable to retrieve full-text content]Bayes Nets have remarkable properties that make them better than many traditional methods in determining variables’ effects. This article explains the principle advantages.
Original Post: How Bayesian Networks Are Superior in Understanding Effects of Variables

Calculating the house edge of a slot machine, with R

Modern slot machines (fruit machine, pokies, or whatever those electronic gambling devices are called in your part of the world) are designed to be addictive. They’re also usually quite complicated, with a bunch of features that affect the payout of a spin: multiple symbols with different pay scales, wildcards, scatter symbols, free spins, jackpots … the list goes on. Many machines also let you play multiple combinations at the same time (20 lines, or 80, or even more with just one spin). All of this complexity is designed to make it hard for you, the player, to judge the real odds of success. But rest assured: in the long run, you always lose.  All slot machines are designed to have a “house edge” — the percentage of player bets retained by the machine in the long run — greater than…
Original Post: Calculating the house edge of a slot machine, with R

Role Playing with Probabilities: The Importance of Distributions

by Jocelyn Barker, Data Scientist at Microsoft I have a confession to make. I am not just a statistics nerd; I am also a role-playing games geek. I have been playing Dungeons and Dragons (DnD) and its variants since high school. While playing with my friends the other day it occurred to me, DnD may have some lessons to share in my job as a data scientist. Hidden in its dice rolling mechanics is a perfect little experiment for demonstrating at least one reason why practitioners may resist using statistical methods even when we can demonstrate a better average performance than previous methods. It is all about distributions. While our averages may be higher, the distribution of individual data points can be disastrous. Why Use Role-Playing Games as an Example? Partially because it means I get to think about one…
Original Post: Role Playing with Probabilities: The Importance of Distributions

30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets

[unable to retrieve full-text content]This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.
Original Post: 30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets