Practical advice for analysis of large, complex data sets

By PATRICK RILEY For a number of years, I led the data science team for Google Search logs. We were often asked to make sense of confusing results, measure new phenomena from logged behavior, validate analyses done by others, and interpret metrics of user behavior. Some people seemed to be naturally good at doing this kind of high quality data…
Original Post: Practical advice for analysis of large, complex data sets

Statistics for Google Sheets

BY STEVEN L. SCOTTBig data is new and exciting, but there are still lots of small data problems in the world. Many people who are just becoming aware that they need to work with data are finding that they lack the tools to do so. The statistics app for Google Sheets hopes to change that. Editor’s note: We’ve mostly portrayed…
Original Post: Statistics for Google Sheets

Next generation tools for data science

By DAVID ADAMSSince inception, this blog has defined “data science” as inference derived from data too big to fit on a single computer. Thus the ability to manipulate big data is essential to our notion of data science. While MapReduce remains a fundamental tool, many interesting analyses require more than it can offer. For instance, the well-known Mantel-Haenszel estimator cannot…
Original Post: Next generation tools for data science

Mind Your Units

By JEAN STEINERRandomized A/B experiments are the gold standard for estimating causal effects. The analysis can be straightforward, especially when it’s safe to assume that individual observations of an outcome measure are independent. However, this is not always the case. When observations are not independent, an analysis that assumes independence can lead us to believe that effects are significant when…
Original Post: Mind Your Units

To Balance or Not to Balance?

By IVAN DIAZ & JOSEPH KELLYDetermining the causal effects of an action—which we call treatment—on an outcome of interest is at the heart of many data analysis efforts. In an ideal world, experimentation through randomization of the treatment assignment allows the identification and consistent estimation of causal effects. In observational studies treatment is assigned by nature, therefore its mechanism is…
Original Post: To Balance or Not to Balance?

Estimating causal effects using geo experiments

by JOUNI KERMAN, JON VAVER, and JIM KOEHLER Randomized experiments represent the gold standard for determining the causal effects of app or website design decisions on user behavior. We might be interested in comparing, for example, different subscription offers, different versions of terms and conditions, or different user interfaces. When it comes to online ads, there is also a fundamental…
Original Post: Estimating causal effects using geo experiments

Using Random Effects Models in Prediction Problems

by NICHOLAS A. JOHNSON     ALAN ZHAO     KAI YANG      SHENG WU      FRANK O. KUEHNEL      ALI NASIRI AMINI In this post, we give a brief introduction to random effects models, and discuss some of their uses. Through simulation we illustrate issues with model-fitting techniques that depend on matrix factorization. Far from hypothetical, we have…
Original Post: Using Random Effects Models in Prediction Problems

LSOS experiments: how I learned to stop worrying and love the variability

by AMIR NAJMIIn the previous post we looked at how large scale online services (LSOS) must contend with the high coefficient of variation (CV) of the observations of particular interest to them. In this post we explore why some standard statistical techniques to reduce variance are often ineffective in this “data-rich, information-poor” realm.Despite a very large number of experimental units,…
Original Post: LSOS experiments: how I learned to stop worrying and love the variability

The Notorious N.H.S.T. presents: Mo P-values Mo Problems

Alain Content writes: I am a psycholinguist who teaches statistics (and also sometimes publishes in Psych Sci). I am writing because as I am preparing for some future lessons, I fall back on a very basic question which has been worrying me for some time, related to the reasoning underlying NHST [null hypothesis significance testing]. Put simply, what is the…
Original Post: The Notorious N.H.S.T. presents: Mo P-values Mo Problems

“Chatting with the Tea Party”

I got an email last month offering two free tickets to the preview of a new play, Chatting with the Tea Party, described as “a documentary-style play about a New York playwright’s year attending Tea Party meetings around the country and interviewing local leaders. Nothing the Tea Party people in the play say has been made up.” I asked if…
Original Post: “Chatting with the Tea Party”