Static sensitivity analysis: Computing robustness of Bayesian inferences to the choice of hyperparameters

Ryan Giordano wrote: Last year at StanCon we talked about how you can differentiate under the integral to automatically calculate quantitative hyperparameter robustness for Bayesian posteriors. Since then, I’ve packaged the idea up into an R library that plays nice with Stan. You can install it from this github repo. I’m sure you’ll be pretty busy at StanCon, but I’ll be there presenting a poster about exactly this work, and if you have a moment to chat I’d be very interested to hear what you think! I’ve started applying this package to some of the Stan examples, and it’s already uncovered some (in my opinion) serious problems, like this one from chapter 13.5 of the ARM book. It’s easy to accidentally make a non-robust model, and I think a tool like this could be very useful to Stan users! As…
Original Post: Static sensitivity analysis: Computing robustness of Bayesian inferences to the choice of hyperparameters

A Python program for multivariate missing-data imputation that works on large datasets!?

Alex Stenlake and Ranjit Lall write about a program they wrote for imputing missing data: Strategies for analyzing missing data have become increasingly sophisticated in recent years, most notably with the growing popularity of the best-practice technique of multiple imputation. However, existing algorithms for implementing multiple imputation suffer from limited computational efficiency, scalability, and capacity to exploit complex interactions among large numbers of variables. These shortcomings render them poorly suited to the emerging era of “Big Data” in the social and natural sciences. Drawing on new advances in machine learning, we have developed an easy-to-use Python program – MIDAS (Multiple Imputation with Denoising Autoencoders) – that leverages principles of Bayesian nonparametrics to deliver a fast, scalable, and high-performance implementation of multiple imputation. MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are capable of producing complex,…
Original Post: A Python program for multivariate missing-data imputation that works on large datasets!?

Three new domain-specific (embedded) languages with a Stan backend

Three new domain-specific (embedded) languages with a Stan backend One is an accident. Two is a coincidence. Three is a pattern. Perhaps it’s no coincidence that there are three new interfaces that use Stan’s C++ implementation of adaptive Hamiltonian Monte Carlo (currently an updated version of the no-U-turn sampler). ScalaStan embeds a Stan-like language in Scala. It’s a Scala package largely (if not entirely written by Joe Wingbermuehle.[GitHub link] tmbstan lets you fit TMB models with Stan. It’s an R package listing Kasper Kristensen as author.[CRAN link] SlicStan is a “blockless” and self-optimizing version of Stan. It’s a standalone language coded in F# written by Maria Gorinova.[pdf language spec] These are in contrast with systems that entirely reimplement a version of the no-U-turn sampler, such as PyMC3, ADMB, and NONMEM.
Original Post: Three new domain-specific (embedded) languages with a Stan backend

How does probabilistic computation differ in physics and statistics?

[image of Schrodinger’s cat, of course] Stan collaborator Michael Betancourt wrote an article, “The Convergence of Markov chain Monte Carlo Methods: From the Metropolis method to Hamiltonian Monte Carlo,” discussing how various ideas of computational probability moved from physics to statistics. Three things I wanted to add to Betancourt’s story: 1. My paper with Rubin on R-hat, that measure of mixing for iterative simulation, came in part from my reading of old papers in the computational physics literature, in particular Fosdick (1959), which proposed a multiple-chain approach to monitoring convergence. What we added in our 1992 paper was the within-chain comparison: instead of simply comparing multiple chains to each other, we compared their variance to the within-chain variance. This enabled the diagnostic to be much more automatic. 2. Related to point 1 above: It’s my impression that computational physics is…
Original Post: How does probabilistic computation differ in physics and statistics?

Stopping rules and Bayesian analysis

Stopping rules and Bayesian analysis Posted by Andrew on 3 January 2018, 10:24 pm This is an old one but i think there still may be interest in the topic. In this post, I explain how to think about stopping rules in Bayesian inference and why, from a Bayesian standpoint, it’s not cheating to run an experiment until you get statistical significance and then stop. If the topic interests you, I recommend you read the main post (including the P.S.) and also the comments.
Original Post: Stopping rules and Bayesian analysis

“Handling Multiplicity in Neuroimaging through Bayesian Lenses with Hierarchical Modeling”

Donald Williams points us to this new paper by Gang Chen, Yaqiong Xiao, Paul Taylor, Tracy Riggins, Fengji Geng, Elizabeth Redcay, and Robert Cox: In neuroimaging, the multiplicity issue may sneak into data analysis through several channels . . . One widely recognized aspect of multiplicity, multiple testing, occurs when the investigator fits a separate model for each voxel in the brain. However, multiplicity also occurs when the investigator conducts multiple comparisons within a model, tests two tails of a t-test separately when prior information is unavailable about the directionality, and branches in the analytic pipelines. . . . More fundamentally, the adoption of dichotomous decisions through sharp thresholding under NHST may not be appropriate when the null hypothesis itself is not pragmatically relevant because the effect of interest takes a continuum instead of discrete values and is not expected…
Original Post: “Handling Multiplicity in Neuroimaging through Bayesian Lenses with Hierarchical Modeling”

The failure of null hypothesis significance testing when studying incremental changes, and what to do about it

A few months ago I wrote a post, “Cage match: Null-hypothesis-significance-testing meets incrementalism. Nobody comes out alive.” I soon after turned it into an article, published in Personality and Social Psychology Bulletin, with the title given above and the following abstract: A standard mode of inference in social and behavioral science is to establish stylized facts using statistical significance in quantitative studies. However, in a world in which measure- ments are noisy and effects are small, this will not work: selection on statistical significance leads to effect sizes which are overestimated and often in the wrong direction. After a brief discussion of two examples, one in economics and one in social psychology, we consider the procedural solution of open post-publication review, the design solution of devoting more effort to accurate measurements and within-person comparisons, and the statistical analysis solution of…
Original Post: The failure of null hypothesis significance testing when studying incremental changes, and what to do about it

Setting up a prior distribution in an experimental analysis

Setting up a prior distribution in an experimental analysis Posted by Andrew on 23 December 2017, 9:55 am Baruch Eitam writes: My colleague and I have gotten into a slight dispute about prior selection. Below are our 3 different opinions, the first is the uniform (will get to that in a sec) and the other two are the priors of dispute. The parameter we are trying to estimate is people’s reporting ability under conditions in which a stimulus which they have just seen is “task irrelevant” — as they has 5 options to pick from, chance level is .2. My preferred prior is the higher one as it reflects our more conservative estimation of the effect (higher modal theta -> reflects less errors), My colleague on the other hand opted for averaging all our previous experiments which ended up giving larger…
Original Post: Setting up a prior distribution in an experimental analysis

R-squared for Bayesian regression models

R-squared for Bayesian regression models Posted by Andrew on 21 December 2017, 9:03 am Ben, Jonah, Imad, and I write: The usual definition of R-squared (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the variance of the predicted values divided by the variance of predicted values plus the variance of the errors. This summary is computed automatically for linear and generalized linear regression models fit using rstanarm, our R package for fitting Bayesian applied regression models with Stan. . . . The full paper is here.
Original Post: R-squared for Bayesian regression models

We need to stop sacrificing women on the altar of deeply mediocre men (ISBA edition)

(This is not Andrew. I would ask you not to speculate in the comments who S is, this is not a great venue for that.) Kristian Lum just published an essay about her experiences being sexually assaulted at statistics conferences.  You should read the whole thing because it’s important, but there’s a sample paragraph. I debated saying something about him at the time, but who would have cared? It was a story passed down among female graduate students in my circles that when one woman graduate student was groped at a party by a professor and reported it to a senior female professor, she was told that if she wanted to stay in the field, she’d just have to get used to it. On many occasions, I have been smacked on the butt at conferences. No one ever seemed to…
Original Post: We need to stop sacrificing women on the altar of deeply mediocre men (ISBA edition)