Some natural solutions to the p-value communication problem—and why they won’t work.

John Carlin and I write: It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis is true. A common conceptual error is that researchers take the rejection of a straw-man null as evidence in favor of their preferred alternative. A standard mode of operation goes like this: p < 0.05 is taken as strong evidence against the null hypothesis, p > 0.15 is taken as evidence in favor of the null, and p near 0.10 is taken either as weak evidence for an effect or as evidence of a weak effect. Unfortunately, none of those inferences is generally appropriate: a…
Original Post: Some natural solutions to the p-value communication problem—and why they won’t work.

What is needed to do good research (hint: it’s not just the avoidance of “too much weight given to small samples, a tendency to publish positive results and not negative results, and perhaps an unconscious bias from the researchers themselves”)

[cat picture] In a news article entitled, “No, Wearing Red Doesn’t Make You Hotter,” Dalmeet Singh Chawla recounts the story of yet another Psychological Science / PPNAS-style study (this one actually appeared back in 2008 in Journal of Personality and Social Psychology, the same prestigious journal which published Daryl Bem’s ESP study a couple years later). Chawla’s article is just fine, and I think these non-replications should continue to get press, as much press as the original flawed studies. I have just two problem. The first is when Chawla writes: The issues at hand seem to be the same ones surfacing again and again in the replication crisis—too much weight given to small samples, a tendency to publish positive results and not negative results, and perhaps an unconscious bias from the researchers themselves. I mean, sure, yeah, I agree with…
Original Post: What is needed to do good research (hint: it’s not just the avoidance of “too much weight given to small samples, a tendency to publish positive results and not negative results, and perhaps an unconscious bias from the researchers themselves”)

Mockery is the best medicine

Mockery is the best medicine Posted by Andrew on 11 May 2017, 4:37 pm [cat picture] I’m usually not such a fan of twitter, but Jeff sent me this, from Andy Hall, and it’s just hilarious: The background is here. But Hall is missing a few key determinants of elections and political attitudes: subliminal smiley faces, college football, fat arms, and, of course, That Time of the Month. You can see why I can’t do twitter. I’m not concise enough.
Original Post: Mockery is the best medicine

“P-hacking” and the intention-to-cheat effect

“P-hacking” and the intention-to-cheat effect Posted by Andrew on 10 May 2017, 5:53 pm I’m a big fan of the work of Uri Simonsohn and his collaborators, but I don’t like the term “p-hacking” because it can be taken to imply an intention to cheat. The image of p-hacking is of a researcher trying test after test on the data until reaching the magic “p less than .05.” But, as Eric Loken and I discuss in our paper on the garden of forking paths, multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. I worry that the widespread use term “p-hacking” gives two wrong impressions: First, it implies that the many researchers who use p-values incorrectly are cheating or “hacking,” even though I suspect they’re mostly…
Original Post: “P-hacking” and the intention-to-cheat effect

We fiddle while Rome burns: p-value edition

Raghu Parthasarathy presents a wonderfully clear example of disastrous p-value-based reasoning that he saw in a conference presentation. Here’s Raghu: Consider, for example, some tumorous cells that we can treat with drugs 1 and 2, either alone or in combination. We can make measurements of growth under our various drug treatment conditions. Suppose our measurements give us the following graph: . . . from which we tell the following story: When administered on their own, drugs 1 and 2 are ineffective — tumor growth isn’t statistically different than the control cells (p > 0.05, 2 sample t-test). However, when the drugs are administered together, they clearly affect the cancer (p < 0.05); in fact, the p-value is very small (0.002!). This indicates a clear synergy between the two drugs: together they have a much stronger effect than each alone does.…
Original Post: We fiddle while Rome burns: p-value edition

Nooooooo, just make it stop, please!

Dan Kahan wrote: You should do a blog on this. I replied: I don’t like this article but I don’t really see the point in blogging on it. Why bother? Kahan: BECAUSE YOU REALLY NEVER HAVE EXPLAINED WHY. Gelman-Rubin criticque of BIC is not responsive; you have something in mind—tell us what, pls! Inquiring minds what to know. Me: Wait, are you saying it’s not clear to you why I should hate that paper?? Kahan: YES!!!!!!! Certainly what say about “model selection” aspects of BIC in Gelman-Rubin don’t apply. Me: OK, OK. . . . The paper is called, Bayesian Benefits for the Pragmatic Researcher, and it’s by some authors whom I like and respect, but I don’t like what they’re doing. Here’s their abstract: The practical advantages of Bayesian inference are demonstrated here through two concrete examples. In the…
Original Post: Nooooooo, just make it stop, please!

Emails I never bothered to answer

Emails I never bothered to answer Posted by Andrew on 26 December 2016, 9:10 am So, this came in the email one day: Dear Professor Gelman, I would like to shortly introduce myself: I am editor in the ** Department at the publishing house ** (based in ** and **). As you may know, ** has taken over all journals of ** Press. We are currently restructuring some of the journals and are therefore looking for new editors for the journal **. You have published in the journal, you work in the field . . . your name was recommended by Prof. ** as a potential editor for the journal. . . . We think you would be an excellent choice and I would like to ask you kindly whether you are interested to become an editor of the journal. In…
Original Post: Emails I never bothered to answer

p=.03, it’s gotta be true!

p=.03, it’s gotta be true! Posted by Andrew on 24 December 2016, 9:39 am Howie Lempel writes: Showing a white person a photo of Obama w/ artificially dark skin instead of artificially lightened skin before asking whether they support the Tea Party raises their probability of saying “yes” from 12% to 22%. 255 person Amazon Turk and Craigs List sample, p=.03. Nothing too unusual about this one. But it’s particularly grating when hyper educated liberal elites use shoddy research to decide that their political opponents only disagree with them because they’re racist. https://www.washingtonpost.com/news/wonk/wp/2016/05/13/how-psychologists-used-these-doctored-obama-photos-to-get-white-people-to-support-conservative-politics/ https://news.stanford.edu/2016/05/09/perceived-threats-racial-status-drive-white-americans-support-tea-party-stanford-scholar-says/ Hey, they could have a whole series of this sort of experiment: – Altering the orange hue of Donald Trump’s skin and seeing if it affects how much people trust the guy . . . – Making Hillary Clinton fatter and seeing if that somehow makes her…
Original Post: p=.03, it’s gotta be true!

This is not news.

This is not news. Posted by Andrew on 22 December 2016, 11:06 am Anne Pier Salverda writes: I’m not sure if you’re keeping track of published failures to replicate the power posing effect, but this article came out earlier this month: “Embodied power, testosterone, and overconfidence as a causal pathway to risk-taking.” From the abstract: We were unable to replicate the findings of the original study and subsequently found no evidence for our extended hypotheses. Gotta love that last sentence of the abstract: As our replication attempt was conducted in the Netherlands, we discuss the possibility that cultural differences may play a moderating role in determining the physiological and psychological effects of power posing. Let’s just hope that was a joke. Jokes are ok in academic papers, right?
Original Post: This is not news.

Hark, hark! the p-value at heaven’s gate sings

Three different people pointed me to this post, in which food researcher and business school professor Brian Wansink advises Ph.D. students to “never say no”: When a research idea comes up, check it out, put some time into it and you might get some success. I like that advice and I agree with it. Or, at least, this approached worked for me when I was a student and it continues to work for me now, and my favorite students are those who follow this approach. That said, there could be some selection bias here, that the students who say Yes to new projects are the ones who are more likely to be able to make use of such opportunities. Maybe the students who say No would just end up getting distracted and making no progress, were they to follow this…
Original Post: Hark, hark! the p-value at heaven’s gate sings