On the biases in data

Whether we’re developing statistical models, training machine learning recognizers, or developing AI systems, we start with data. And while the suitability of that data set is, lamentably, sometimes measured by its size, it’s always important to reflect on where those data come from. Data are not neutral: the data we choose to use has profound impacts on the resulting systems we develop. A recent article in Microsoft’s AI Blog discusses the inherent biases found in many data sets: “The people who are collecting the datasets decide that, ‘Oh this represents what men and women do, or this represents all human actions or human faces.’ These are types of decisions that are made when we create what are called datasets,” she said. “What is interesting about training datasets is that they will always bear the marks of history, that history will…
Original Post: On the biases in data

Understanding Bias in Peer Review

Posted by Andrew Tomkins, Director of Engineering and William D. Heavlin, Statistician, Google ResearchIn the 1600’s, a series of practices came into being known collectively as the “scientific method.” These practices encoded verifiable experimentation as a path to establishing scientific fact. Scientific literature arose as a mechanism to validate and disseminate findings, and standards of scientific peer review developed as a means to control the quality of entrants into this literature. Over the course of development of peer review, one key structural question remains unresolved to the current day: should the reviewers of a piece of scientific work be made aware of the identify of the authors? Those in favor argue that such additional knowledge may allow the reviewer to set the work in perspective and evaluate it more completely. Those opposed argue instead that the reviewer may form an…
Original Post: Understanding Bias in Peer Review

Understanding Bias in Peer Review

Posted by Andrew Tomkins, Director of Engineering and William D. Heavlin, Statistician, Google ResearchIn the 1600’s, a series of practices came into being known collectively as the “scientific method.” These practices encoded verifiable experimentation as a path to establishing scientific fact. Scientific literature arose as a mechanism to validate and disseminate findings, and standards of scientific peer review developed as a means to control the quality of entrants into this literature. Over the course of development of peer review, one key structural question remains unresolved to the current day: should the reviewers of a piece of scientific work be made aware of the identify of the authors? Those in favor argue that such additional knowledge may allow the reviewer to set the work in perspective and evaluate it more completely. Those opposed argue instead that the reviewer may form an…
Original Post: Understanding Bias in Peer Review

KDnuggets™ News 17:n45, Nov 29: New Poll: Data Science Methods Used? Deep Learning Specialization: 21 Lessons Learned

[unable to retrieve full-text content]Also The 10 Statistical Techniques Data Scientists Need to Master; Did Spark Really Kill Hadoop? A Framework for Textual Data Science.
Original Post: KDnuggets™ News 17:n45, Nov 29: New Poll: Data Science Methods Used? Deep Learning Specialization: 21 Lessons Learned

You have created your first Linear Regression Model. Have you validated the assumptions?

[unable to retrieve full-text content]Linear Regression is an excellent starting point for Machine Learning, but it is a common mistake to focus just on the p-values and R-Squared values while determining validity of model. Here we examine the underlying assumptions of a Linear Regression, which need to be validated before applying the model.
Original Post: You have created your first Linear Regression Model. Have you validated the assumptions?