4 Common Data Fallacies That You Need To Know

[unable to retrieve full-text content]In this post you will find a list of common the data fallacies that lead to incorrect conclusions and poor decision-making using data. Here you will find great resources and information so that you can always be reminded of these fallacies when you’re working with data.
Original Post: 4 Common Data Fallacies That You Need To Know

Stop Doing Fragile Research

[unable to retrieve full-text content]If you develop methods for data analysis, you might only be conducting gentle tests of your method on idealized data. This leads to “fragile research,” which breaks when released into the wild. Here, I share 3 ways to make your methods robust.
Original Post: Stop Doing Fragile Research

Understanding overfitting: an inaccurate meme in supervised learning

[unable to retrieve full-text content]Applying cross-validation prevents overfitting” is a popular meme, but is not actually true – it more of an urban legend. We examine what is true and how overfitting is different from overtraining.
Original Post: Understanding overfitting: an inaccurate meme in supervised learning

Making Predictive Models Robust: Holdout vs Cross-Validation

[unable to retrieve full-text content]The validation step helps you find the best parameters for your predictive model and prevent overfitting. We examine pros and cons of two popular validation strategies: the hold-out strategy and k-fold.
Original Post: Making Predictive Models Robust: Holdout vs Cross-Validation

Sound Data Science: Avoiding the Most Pernicious Prediction Pitfall

In this excerpt from the updated edition of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Revised and Updated Edition, I show that, although data science and predictive analytics’ explosive popularity promises meteoric value, a common misapplication readily backfires. The number crunching only delivers if a fundamental—yet often omitted—failsafe is applied. Prediction is booming. Data scientists have the “sexiest job of the 21st century” (as Professor Thomas Davenport and US Chief Data Scientist D.J. Patil declared in 2012). Fueled by the data tsunami, we’ve entered a golden age of predictive discoveries. A frenzy of analysis churns out a bonanza of colorful, valuable, and sometimes surprising insights:[i] • People who “like” curly fries on Facebook are more intelligent. • Typing with proper capitalization indicates creditworthiness. • Users of the Chrome and Firefox browsers make better employees.…
Original Post: Sound Data Science: Avoiding the Most Pernicious Prediction Pitfall

4 Reasons Your Machine Learning Model is Wrong (and How to Fix It)

By Bilal Mahmood, Bolt. There are a number of machine learning models to choose from. We can use Linear Regression to predict a value, Logistic Regression to classify distinct outcomes, and Neural Networks to model non-linear behaviors. When we build these models, we always use a set of historical data to help our machine learning algorithms learn what is the relationship between a set of input features to a predicted output. But even if this model can accurately predict a value from historical data, how do we know it will work as well on new data? Or more plainly, how do we evaluate whether a machine learning model is actually “good”? In this post we’ll walk through some common scenarios where a seemingly good machine learning model may still be wrong. We’ll show how you can evaluate these issues by assessing metrics of bias vs.…
Original Post: 4 Reasons Your Machine Learning Model is Wrong (and How to Fix It)