Looking for data on speed and traffic accidents—and other examples of data that can be fit by nonlinear models

Looking for data on speed and traffic accidents—and other examples of data that can be fit by nonlinear models Posted by Andrew on 2 November 2017, 9:17 am [cat picture] For the chapter in Regression and Other Stories that includes nonlinear regression, I’d like a couple homework problems where the kids have to construct and fit models to real data. So I need some examples. We already have the success of golf putts as a function of distance from the hole, and I’d like some others. One thing that came to mind today, because I happened to see a safety warning poster on the bus reminding people not to drive too fast, is data on speed and traffic accidents. But I’m interested in other examples too. Just about anything interesting with data on x and y where there’s no simple linear…
Original Post: Looking for data on speed and traffic accidents—and other examples of data that can be fit by nonlinear models

Advice for science writers!

I spoke today at a meeting of science journalists, in a session organized by Betsy Mason, also featuring Kristin Sainani, Christie Aschwanden, and Tom Siegfried. My talk was on statistical paradoxes of science and science journalism, and I mentioned the Ted Talk paradox, Who watches the watchmen, the Eureka bias, the “What does not kill my statistical significance makes it stronger” fallacy, the unbiasedness fallacy, selection bias in what gets reported, the Australia hypothesis, and how we can do better. Sainani gave some examples illustrating that journalists with no particular statistical or subject-matter expertise should be able to see through some of the claims made in published papers, where scientists misinterpret their own data or go far beyond what was implied by their data. Aschwanden and Siegfried talked about the confusions surrounding p-values and recommended that reporters pretty much forget…
Original Post: Advice for science writers!

My favorite definition of statistical significance

My favorite definition of statistical significance Posted by Andrew on 28 October 2017, 1:08 pm From my 2009 paper with Weakliem: Throughout, we use the term statistically significant in the conventional way, to mean that an estimate is at least two standard errors away from some “null hypothesis” or prespecified value that would indicate no effect present. An estimate is statistically insignificant if the observed value could reasonably be explained by simple chance variation, much in the way that a sequence of 20 coin tosses might happen to come up 8 heads and 12 tails; we would say that this result is not statistically significantly different from chance. More precisely, the observed proportion of heads is 40 percent but with a standard error of 11 percent—thus, the data are less than two standard errors away from the null hypothesis of 50…
Original Post: My favorite definition of statistical significance

The Real World Interactive Learning Tutorial

The Real World Interactive Learning Tutorial Alekh and I have been polishin the Real World Interactive Learning tutorial for ICML 2017 on Sunday. This tutorial should be of pretty wide interest. For data scientists, we are crossing a threshold into easy use of interactive learning while for researchers interactive learning is plausibly the most important frontier of understanding. Great progress on both the theory and especially on practical systems has been made since an earlier NIPS 2013 tutorial. Please join us if you are interested
Original Post: The Real World Interactive Learning Tutorial

It’s hard to know what to say about an observational comparison that doesn’t control for key differences between treatment and control groups, chili pepper edition

It’s hard to know what to say about an observational comparison that doesn’t control for key differences between treatment and control groups, chili pepper edition Posted by Andrew on 3 August 2017, 9:55 am Jonathan Falk points to this article and writes: Thoughts? I would have liked to have seen the data matched on age, rather than simply using age in a Cox regression, since I suspect that’s what really going on here. The non-chili eaters were much older, and I suspect that the failure to interact age, or at least specify the age effect more finely, has a gigantic impact here, especially since the raw inclusion of age raised the hazard ratio dramatically. Having controlled for Blood, Sugar, and Sex, the residual must be Magik. My reply: Yes, also they need to interact age x sex, and smoking is another…
Original Post: It’s hard to know what to say about an observational comparison that doesn’t control for key differences between treatment and control groups, chili pepper edition

Seemingly intuitive and low math intros to Bayes never seem to deliver as hoped: Why?

This post was prompted by recent nicely done videos by Rasmus Baath that provide an intuitive and low math introduction to Bayesian material. Now, I do not know that these have delivered less than he hoped for. Nor I have asked him. However, given similar material I and others have tried out in the past that did not deliver what was hoped for, I am anticipating that and speculating why here. I have real doubts about such material actually enabling others to meaningfully interpret Bayesian analyses let alone implement them themselves. For instance, in a conversation last year with David Spiegelhalter, his take was that some material I had could easily be followed by many, but the concepts that material was trying to get across were very subtle and few would have the background to connect to them. On the other…
Original Post: Seemingly intuitive and low math intros to Bayes never seem to deliver as hoped: Why?

Giving feedback indirectly by invoking a hypothetical reviewer

Giving feedback indirectly by invoking a hypothetical reviewer Posted by Andrew on 2 August 2017, 9:44 am Ethan Bolker points us to this discussion on “How can I avoid being “the negative one” when giving feedback on statistics?”, which begins: Results get sent around a group of biological collaborators for feedback. Comments come back from the senior members of the group about the implications of the results, possible extensions, etc. I look at the results and I tend not to be as good at the “big picture” stuff (I’m a relatively junior member of the team), but I’m reasonably good with statistics (and that’s my main role), so I look at the details. Sometimes I think to myself “I don’t think those conclusions are remotely justified by the data”. How can I give honest feedback in a way that doesn’t come…
Original Post: Giving feedback indirectly by invoking a hypothetical reviewer

Integrating Audio, Video, and Discussion Boards with Course Notes

As a biostatistics teacher I’ve spent a lot of time thinking about inverting the classroom and adding multimedia content. My first thought was to create YouTube videos corresponding to sections in my lecture notes. This typically entails recording the computer screen while going through slides, adding a voiceover. I realized that the maintenance of such videos is difficult, and this also creates a barrier to adding new content. In addition, the quality of the video image is lower than just having the student use a pdf viewer on the original notes. For those reasons I decided to create audio narration for the sections in the notes to largely capture what I would say during a live lecture. The audio mp3 files are stored on a local server and are streamed on demand when a study clicks on the audio icon…
Original Post: Integrating Audio, Video, and Discussion Boards with Course Notes

DePaul University, School of Computing: Instructor in Data Science

[unable to retrieve full-text content]Seeking Instructors in Data Science, with expertise and teaching experience in data science with an emphasis in computational statistics, data mining, data visualization, pattern recognition or machine learning.
Original Post: DePaul University, School of Computing: Instructor in Data Science

Machine Learning the Future Class

Machine Learning the Future Class This spring, I taught a class on Machine Learning the Future at Cornell Tech covering a number of advanced topics in machine learning including online learning, joint (structured) prediction, active learning, contextual bandit learning, logarithmic time prediction, and parallel learning. Each of these classes was recorded from the laptop via Zoom and I just uploaded the recordings to Youtube. In some ways, this class is a followup to the large scale learning class I taught with Yann LeCun 4 years ago. The videos for that class were taken down(*) so these lectures both update and replace shared subjects as well as having some new subjects. Much of this material is fairly close to research so to assist other machine learning lecturers around the world in digesting the material, I’ve made all the source available as…
Original Post: Machine Learning the Future Class