Bayesian Statistics, Miscellaneous Statistics, Multilevel Modeling, etc.

A Python program for multivariate missing-data imputation that works on large datasets!?

FavoriteLoadingAdd to favorites

Alex Stenlake and Ranjit Lall write about a program they wrote for imputing missing data: Strategies for analyzing missing data have become increasingly sophisticated in recent years, most notably with the growing popularity of the best-practice technique of multiple imputation. However, existing algorithms for implementing multiple imputation suffer from limited computational efficiency, scalability, and capacity to exploit complex interactions among large numbers of variables. These shortcomings render them poorly suited to the emerging era of “Big Data” in the social and natural sciences. Drawing on new advances in machine learning, we have developed an easy-to-use Python program – MIDAS (Multiple Imputation with Denoising Autoencoders) – that leverages principles of Bayesian nonparametrics to deliver a fast, scalable, and high-performance implementation of multiple imputation. MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are capable of producing complex,…
Original Post: A Python program for multivariate missing-data imputation that works on large datasets!?