[unable to retrieve full-text content]Linear Regression is an excellent starting point for Machine Learning, but it is a common mistake to focus just on the p-values and R-Squared values while determining validity of model. Here we examine the underlying assumptions of a Linear Regression, which need to be validated before applying the model.

# Statistics

## The 10 Statistical Techniques Data Scientists Need to Master

[unable to retrieve full-text content]The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.

## How Bayesian Networks Are Superior in Understanding Effects of Variables

[unable to retrieve full-text content]Bayes Nets have remarkable properties that make them better than many traditional methods in determining variables’ effects. This article explains the principle advantages.

## Calculating the house edge of a slot machine, with R

Modern slot machines (fruit machine, pokies, or whatever those electronic gambling devices are called in your part of the world) are designed to be addictive. They’re also usually quite complicated, with a bunch of features that affect the payout of a spin: multiple symbols with different pay scales, wildcards, scatter symbols, free spins, jackpots … the list goes on. Many machines also let you play multiple combinations at the same time (20 lines, or 80, or even more with just one spin). All of this complexity is designed to make it hard for you, the player, to judge the real odds of success. But rest assured: in the long run, you always lose. All slot machines are designed to have a “house edge” — the percentage of player bets retained by the machine in the long run — greater than…

## Role Playing with Probabilities: The Importance of Distributions

by Jocelyn Barker, Data Scientist at Microsoft I have a confession to make. I am not just a statistics nerd; I am also a role-playing games geek. I have been playing Dungeons and Dragons (DnD) and its variants since high school. While playing with my friends the other day it occurred to me, DnD may have some lessons to share in my job as a data scientist. Hidden in its dice rolling mechanics is a perfect little experiment for demonstrating at least one reason why practitioners may resist using statistical methods even when we can demonstrate a better average performance than previous methods. It is all about distributions. While our averages may be higher, the distribution of individual data points can be disastrous. Why Use Role-Playing Games as an Example? Partially because it means I get to think about one…

## Conjoint Analysis: A Primer

[unable to retrieve full-text content]Conjoint is another of those things everyone talks about but many are confused about…

## Monty Hall finally chooses the exit door

[unable to retrieve full-text content]Monty Hall, the game show host, died last week. He was the host of the popular show “Let’s Make a Deal”, where contestants try to guess which one of 3 doors hides a valuable prize.

## Statistical Mistakes Even Scientists Make

[unable to retrieve full-text content]Scientists are all experts in statistics, right? Wrong.

## 30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets

[unable to retrieve full-text content]This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.

## How To Lie With Numbers

[unable to retrieve full-text content]It takes less effort to lie without numbers, but there are now more numbers and more ways to lie with them than ever before. Poor Reverend Bayes, who understood the true meaning of “evidence”.

