John Carlin and I write: It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis is true. A common conceptual error is that researchers take the rejection of a straw-man null as evidence in favor of their preferred alternative. A standard mode of operation goes like this: p < 0.05 is taken as strong evidence against the null hypothesis, p > 0.15 is taken as evidence in favor of the null, and p near 0.10 is taken either as weak evidence for an effect or as evidence of a weak effect. Unfortunately, none of those inferences is generally appropriate: a…

Original Post: Some natural solutions to the p-value communication problem—and why they won’t work.

# Bayesian Statistics

## A continuous hinge function for statistical modeling

This comes up sometimes in my applied work: I want a continuous “hinge function,” something like the red curve above, connecting two straight lines in a smooth way. Why not include the sharp corner (in this case, the function y=-0.5*x if x<0 or y=0.2*x if x>0)? Two reasons. First, computation: Hamiltonian Monte Carlo can trip on discontinuities. Second, I want a smooth curve anyway, as I’d expect it to better describe reality. Indeed, the linear parts of the curve are themselves typically only approximations. So, when I’m putting this together, I don’t want to take two lines and then stitch them together with some sort of quadratic or cubic, creating a piecewise function with three parts. I just want one simple formula that asymptotes to the lines, as in the above picture. As I said, this problem comes up occasion,…

Original Post: A continuous hinge function for statistical modeling

## Causal inference using Bayesian additive regression trees: some questions and answers

[cat picture] Rachael Meager writes: We’re working on a policy analysis project. Last year we spoke about individual treatment effects, which is the direction we want to go in. At the time you suggested BART [Bayesian additive regression trees; these are not averages of tree models as are usually set up; rather, the key is that many little nonlinear tree models are being summed; in that sense, Bart is more like a nonparametric discrete version of a spline model. —AG]. But there are 2 drawbacks of using BART for this project. (1) BART predicts the outcome not the individual treatment effect – although those are obviously related and there has been some discussion of this in the econ literature. (2) It will be hard for us to back out the covariate combinations / interactions that predict the outcomes / treatment…

Original Post: Causal inference using Bayesian additive regression trees: some questions and answers

## Using Stan for week-by-week updating of estimated soccer team abilites

Milad Kharratzadeh shares this analysis of the English Premier League during last year’s famous season. He fit a Bayesian model using Stan, and the R markdown file is here. The analysis has three interesting features: 1. Team ability is allowed to continuously vary throughout the season; thus, once the season is over, you can see an estimate of which teams were improving or declining. 2. But that’s not what is shown in the plot above. Rather, the plot above shows estimated team abilities after the model was fit to prior information plus week 1 data alone; prior information plus data from weeks 1 and 2; prior information plus data from weeks 1, 2, and 3; etc. For example, look at the plot for surprise victor Leicester City: after a few games, the team is already estimated to be in the…

Original Post: Using Stan for week-by-week updating of estimated soccer team abilites

## Splines in Stan! (including priors that enforce smoothness)

Milad Kharratzadeh shares a new case study. This could be useful to a lot of people.Just for example, here’s the last section of the document, which shows how to simulate the data and fit the model graphed above:Location of Knots and the Choice of Priors In practical problems, it is not always clear how to choose the number/location of the knots. Choosing too many/too few knots may lead to overfitting/underfitting. In this part, we introduce a prior that alleviates the problems associated with the choice of number/locations of the knots to a great extent. Let us start by a simple observation. For any given set of knots, and any B-spline order, we have: $$ sum_{i} B_{i,k}(x) = 1. $$ The proof is simple and can be done by induction. This means that if the B-spline coefficients, $a_i = a$, are…

Original Post: Splines in Stan! (including priors that enforce smoothness)

## Accounting for variation and uncertainty

Accounting for variation and uncertainty Posted by Andrew on 12 May 2017, 9:35 am [cat picture] Yesterday I gave a list of the questions they’re asking me when I speak at the Journal of Accounting Research Conference. All kidding aside, I think that a conference of accountants is the perfect setting for a discussion of of research integrity, as accounting is all about setting up institutions to enable trust. The challenge is that traditional accounting is deterministic: there’s a ledger and that’s that. In statistics, we talk all the time about accounting for variation and uncertainty. Maybe “accounting” is more than a metaphor here, and maybe there’s more of a connection to the traditional practices of accounting than I’d thought.

Original Post: Accounting for variation and uncertainty

## How to interpret “p = .06” in situations where you really really want the treatment to work?

We’ve spent a lot of time during the past few years discussing the difficulty of interpreting “p less than .05” results from noisy studies. Standard practice is to just take the point estimate and confidence interval, but this is in general wrong in that it overestimates effect size (type M error) and can get the direction wrong (type S error). So what about noisy studies where the p-value is more than .05, that is, where the confidence interval includes zero? Standard practice here is to just declare this as a null effect, but of course that’s not right either, as the estimate of 0 is surely a negatively biased estimate of the magnitude of the effect. When the confidence interval includes 0, we can typically say that the data are consistent with no effect. But that doesn’t mean the true…

Original Post: How to interpret “p = .06” in situations where you really really want the treatment to work?

## A completely reasonable-sounding statement with which I strongly disagree

From a couple years ago: In the context of a listserv discussion about replication in psychology experiments, someone wrote: The current best estimate of the effect size is somewhere in between the original study and the replication’s reported value. This conciliatory, split-the-difference statement sounds reasonable, and it might well represent good politics in the context of a war over replications—but from a statistical perspective I strongly disagree with it, for the following reason. The original study’s estimate typically has a huge bias (due to the statistical significance filter). The estimate from the replicated study, assuming it’s a preregistered replication, is unbiased. I think in such a setting the safest course is to use the replication’s reported value as our current best estimate. That doesn’t mean that the original study is “wrong,” but it is wrong to report a biased estimate…

Original Post: A completely reasonable-sounding statement with which I strongly disagree

## “This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything”

“This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything” Posted by Andrew on 4 May 2017, 9:56 am [cat picture] In the context of a statistical application, someone wrote: Since data is retrospective I had to use informative prior. The fit of urine improved significantly (very good) without really affecting concentration. This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything. Hopefully in this case I can justify the prior that 5% error in urine measurements is reasonable. I responded to “This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything” as follows: That’s ok for me. The point is that FDA should require that the prior be science-based. For example, consider the normal(0, infinity) priors that are implicitly…

Original Post: “This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything”

## Prior information, not prior belief

From a couple years ago: The prior distribution p(theta) in a Bayesian analysis is often presented as a researcher’s beliefs about theta. I prefer to think of p(theta) as an expression of information about theta. Consider this sort of question that a classically-trained statistician asked me the other day: If two Bayesians are given the same data, they will come to two conclusions. What do you think about that? Does it bother you? My response is that the statistician has nothing to do with it. I’d prefer to say that if two different analyses are done using different information, they will come to different conclusions. This different information can come in the prior distribution p(theta), it could come in the data model p(y|theta), it could come in the choice of how to set up the model and what data to…

Original Post: Prior information, not prior belief