Classification combines prediction and decision making and usurps the decision maker in specifying costs of wrong decisions. The classification rule must be reformulated if costs/utilities change. Predictions are separate from decisions and can be used by any decision maker. The field of machine learning arose somewhat independently of the field of statistics. As a result, machine learning experts tend not to emphasize probabilistic thinking. Probabilistic thinking and understanding uncertainty and variation are hallmarks of statistics. By the way, one of the best books about probabilistic thinking is Nate Silver’s The Signal and The Noise: Why So Many Predictions Fail But Some Don’t. In the medical field, a classic paper is David Spiegelhalter’s Probabilistic Prediction in Patient Management and Clinical Trials.By not thinking probabilistically, machine learning advocates frequently utilize classifiers instead of using risk prediction models. The situation has gotten acute: many…

Original Post: Classification vs. Prediction

# Statistical Thinking

## Statistical Criticism is Easy; I Need to Remember That Real People are Involved

I have been critical of a number of articles, authors, and journals in this growing blog article. Linking the blog with Twitter is a way to expose the blog to more readers. It is far too easy to slip into hyperbole on the blog and even easier on Twitter with its space limitations. Importantly, many of the statistical problems pointed out in my article, are very, very common, and I dwell on recent publications to get the point across that inadequate statistical review at medical journals remains a serious problem. Equally important, many of the issues I discuss, from p-values, null hypothesis testing to issues with change scores are not well covered in medical education (of authors and referees), and p-values have caused a phenomenal amount of damage to the research enterprise. Still, journals insist on emphasizing p-values. I spend…

Original Post: Statistical Criticism is Easy; I Need to Remember That Real People are Involved

## Statistical Errors in the Medical Literature

Misinterpretation of P-values and Main Study Results Dichotomania Problems With Change Scores Improper Subgrouping Serial Data and Response Trajectories As Doug Altman famously wrote in his Scandal of Poor Medical Research in BMJ in 1994, the quality of how statistical principles and analysis methods are applied in medical research is quite poor. According to Doug and to many others such as Richard Smith, the problems have only gotten worse. The purpose of this blog article is to contain a running list of new papers in major medical journals that are statistically problematic, based on my random encounters with the literature.One of the most pervasive problems in the medical literature (and in other subject areas) is misuse and misinterpretation of p-values as detailed here, and chief among these issues is perhaps the absence of evidence is not evidence of absence error written about so clearly by Altman…

Original Post: Statistical Errors in the Medical Literature

## Continuous Learning from Data: No Multiplicities from Computing and Using Bayesian Posterior Probabilities as Often as Desired

(In a Bayesian analysis) It is entirely appropriate to collect datauntil a point has been proven or disproven, or until the data collectorruns out of time, money, or patience. – Edwards, Lindman, Savage (1963) Bayesian inference, which follows the likelihood principle, is not affected by the experimental design or intentions of the investigator. P-values can only be computed if both of these are known, and as been described by Berry (1987) and others, it is almost never the case that the computation of the p-value at the end of a study takes into account all the changes in design that were necessitated when pure experimental designs encounter the real world. When performing multiple data looks as a study progress, one can accelerate learning by more quickly abandoning treatments that do not work, by sometimes stopping early for efficacy, and frequently…

Original Post: Continuous Learning from Data: No Multiplicities from Computing and Using Bayesian Posterior Probabilities as Often as Desired

## Bayesian vs. Frequentist Statements About Treatment Efficacy

The following examples are intended to show the advantages of Bayesian reporting of treatment efficacy analysis, as well as to provide examples contrasting with frequentist reporting. As detailed here, there are many problems with p-values, and some of those problems will be apparent in the examples below. Many of the advantages of Bayes are summarized here. As seen below, Bayesian posterior probabilities prevent one from concluding equivalence of two treatments on an outcome when the data do not support that (i.e., the “absence of evidence is not evidence of absence” error). Suppose that a parallel group randomized clinical trial is conducted to gather evidence about the relative efficacy of new treatment B to a control treatment A. Suppose there are two efficacy endpoints: systolic blood pressure (SBP) and time until cardiovascular/cerebrovascular event. Treatment effect on the first endpoint is assumed…

Original Post: Bayesian vs. Frequentist Statements About Treatment Efficacy

## Introduction

Statistics is a field that is a science unto itself and that benefits all other fields and everyday life. What is unique about statistics is its proven tools for decision making in the face of uncertainty, understanding sources of variation and bias, and most importantly, statistical thinking. Statistical thinking is a different way of thinking that is part detective, skeptical, and involves alternate takes on a problem. An excellent example of statistical thinking is statistician Abraham Wald’s analysis of British bombers surviving to return to their base in World War II: his conclusion was to reinforce bombers in areas in which no damage was observed. For other great examples watch my colleague Chris Fonnesbeck’s Statistical Thinking for Data Science.Some of my personal philosophy of statistics can be summed up in the list below:Statistics needs to be fully integrated into research; experimental design…

Original Post: Introduction

## p-values and Type I Errors are Not the Probabilities We Need

In trying to guard against false conclusions, researchers often attempt to minimize the risk of a “false positive” conclusion. In the field of assessing the efficacy of medical and behavioral treatments for improving subjects’ outcomes, falsely concluding that a treatment is effective when it is not is an important consideration. Nowhere is this more important than in the drug and medical device regulatory environments, because a treatment thought not to work can be given a second chance as better data arrive, but a treatment judged to be effective may be approved for marketing, and if later data show that the treatment was actually not effective (or was only trivially effective) it is difficult to remove the treatment from the market if it is safe. The probability of a treatment not being effective is the probability of “regulator’s regret.” One…

Original Post: p-values and Type I Errors are Not the Probabilities We Need

## Integrating Audio, Video, and Discussion Boards with Course Notes

As a biostatistics teacher I’ve spent a lot of time thinking about inverting the classroom and adding multimedia content. My first thought was to create YouTube videos corresponding to sections in my lecture notes. This typically entails recording the computer screen while going through slides, adding a voiceover. I realized that the maintenance of such videos is difficult, and this also creates a barrier to adding new content. In addition, the quality of the video image is lower than just having the student use a pdf viewer on the original notes. For those reasons I decided to create audio narration for the sections in the notes to largely capture what I would say during a live lecture. The audio mp3 files are stored on a local server and are streamed on demand when a study clicks on the audio icon…

Original Post: Integrating Audio, Video, and Discussion Boards with Course Notes

## My Journey From Frequentist to Bayesian Statistics

Type I error for smoke detector: probability of alarm given no fire=0.05Bayesian: probability of fire given current air dataFrequentist smoke alarm designed as most research is done:Set the alarm trigger so as to have a 0.8 chance of detecting an infernoAdvantage of actionable evidence quantification:Set the alarm to trigger when the posterior probability of a fire exceeds 0.02 while at home and at 0.01 while away If I had been taught Bayesian modeling before being taught the frequentist paradigm, I’m sure I would have always been a Bayesian. I started becoming a Bayesian about 1994 because of an influential paper by David Spiegelhalter and because I worked in the same building at Duke University as Don Berry. Two other things strongly contributed to my thinking: difficulties explaining p-values and confidence intervals (especially the latter) to clinical researchers, and difficulty of learning group…

Original Post: My Journey From Frequentist to Bayesian Statistics

## EHRs and RCTs: Outcome Prediction vs. Optimal Treatment Selection

Frank HarrellProfessor of BiostatisticsVanderbilt University School of MedicineLaura LazzeroniProfessor of Psychiatry and, by courtesy, of Medicine (Cardiovascular Medicine) and of Biomedical Data ScienceStanford University School of MedicineRevised July 17, 2017 It is often said that randomized clinical trials (RCTs) are the gold standard for learning about therapeutic effectiveness. This is because the treatment is assigned at random so no variables, measured or unmeasured, will be truly related to treatment assignment. The result is an unbiased estimate of treatment effectiveness. On the other hand, observational data arising from clinical practice has all the biases of physicians and patients in who gets which treatment. Some treatments are indicated for certain types of patients; some are reserved for very sick ones. The fact is that the selection of treatment is often chosen on the basis of patient characteristics that influence patient outcome, some…

Original Post: EHRs and RCTs: Outcome Prediction vs. Optimal Treatment Selection