Have you ever asked yourself, “how should I approach the classic pre-post analysis?”

Well, maybe not, but this comes up all the time. An investigator wants to assess the effect of an intervention on a outcome. Study participants are randomized either to receive the intervention (could be a new drug, new protocol, behavioral intervention, whatever) or treatment as usual. For each participant, the outcome measure is recorded at baseline – this is the pre in pre/post analysis. The intervention is delivered (or not, in the case of the control group), some time passes, and the outcome is measured a second time. This is our post. The question is, how should we analyze this study to draw conclusions about the intervention’s effect on the outcome? There are at least three possible ways to approach this. (1) Ignore the pre outcome measure and just compare the average post scores of the two groups. (2) Calculate…
Original Post: Have you ever asked yourself, “how should I approach the classic pre-post analysis?”

Importance sampling adds an interesting twist to Monte Carlo simulation

I’m contemplating the idea of teaching a course on simulation next fall, so I have been exploring various topics that I might include. (If anyone has great ideas either because you have taught such a course or taken one, definitely drop me a note.) Monte Carlo (MC) simulation is an obvious one. I like the idea of talking about importance sampling, because it sheds light on the idea that not all MC simulations are created equally. I thought I’d do a brief blog to share some code I put together that demonstrates MC simulation generally, and shows how importance sampling can be an improvement. Like many of the topics I’ve written about, this is a vast one that certainly warrants much, much more than a blog entry. MC simulation in particular, since it is so fundamental to the practice of…
Original Post: Importance sampling adds an interesting twist to Monte Carlo simulation

Simulating a cost-effectiveness analysis to highlight new functions for generating correlated data

My dissertation work (which I only recently completed – in 2012 – even though I am not exactly young, a whole story on its own) focused on inverse probability weighting methods to estimate a causal cost-effectiveness model. I don’t really do any cost-effectiveness analysis (CEA) anymore, but it came up very recently when some folks in the Netherlands contacted me about using simstudy to generate correlated (and clustered) data to compare different approaches to estimating cost-effectiveness. As part of this effort, I developed two more functions in simstudy that allow users to generate correlated data drawn from different types of distributions. Earlier I had created the CorGen functions to generate multivariate data from a single distribution – e.g. multivariate gamma. Now, with the new CorFlex functions (genCorFlex and addCorFlex), users can mix and match distributions. The new version of simstudy is…
Original Post: Simulating a cost-effectiveness analysis to highlight new functions for generating correlated data

When there’s a fork in the road, take it. Or, taking a look at marginal structural models.

I am going to cut right to the chase, since this is the third of three posts related to confounding and weighting, and it’s kind of a long one. (If you want to catch up, the first two are here and here.) My aim with these three posts is to provide a basic explanation of the marginal structural model (MSM) and how we should interpret the estimates. This is obviously a very rich topic with a vast literature, so if you remain interested in the topic, I recommend checking out this (as of yet unpublished) text book by Hernán & Robins for starters.The DAG below is a simple version of how things can get complicated very fast if we have sequential treatments or exposures that both affect and are affected by intermediate factors or conditions.(A_0) and (A_1) represent two treatment…
Original Post: When there’s a fork in the road, take it. Or, taking a look at marginal structural models.

When you use inverse probability weighting for estimation, what are the weights actually doing?

Towards the end of Part 1 of this short series on confounding, IPW, and (hopefully) marginal structural models, I talked a little bit about the fact that inverse probability weighting (IPW) can provide unbiased estimates of marginal causal effects in the context of confounding just as more traditional regression models like OLS can. I used an example based on a normally distributed outcome. Now, that example wasn’t super interesting, because in the case of a linear model with homogeneous treatment effects (i.e. no interaction), the marginal causal effect is the same as the conditional effect (that is, conditional on the confounders.) There was no real reason to use IPW in that example – I just wanted to illustrate that the estimates looked reasonable. But in many cases, the conditional effect is different from the marginal effect. (And in other cases, there…
Original Post: When you use inverse probability weighting for estimation, what are the weights actually doing?

Characterizing the variance for clustered data that are Gamma distributed

Way back when I was studying algebra and wrestling with one word problem after another (I think now they call them story problems), I complained to my father. He laughed and told me to get used to it. “Life is one big word problem,” is how he put it. Well, maybe one could say any statistical analysis is really just some form of multilevel data analysis, whether we treat it that way or not. A key feature of the multilevel model is the ability to explicitly untangle the variation that occurs at different levels. Variation of individuals within a sub-group, variation across sub-groups, variation across groups of sub-groups, and so on. The intra-class coefficient (ICC) is one summarizing statistic that attempts to characterize the relative variability across the different levels. The amount of clustering as measured by the ICC has…
Original Post: Characterizing the variance for clustered data that are Gamma distributed

Visualizing how confounding biases estimates of population-wide (or marginal) average causal effects

When we are trying to assess the effect of an exposure or intervention on an outcome, confounding is an ever-present threat to our ability to draw the proper conclusions. My goal (starting here and continuing in upcoming posts) is to think a bit about how to characterize confounding in a way that makes it possible to literally see why improperly estimating intervention effects might lead to bias. Confounding, potential outcomes, and causal effects Typically, we think of a confounder as a factor that influences both exposure and outcome. If we ignore the confounding factor in estimating the effect of an exposure, we can easily over- or underestimate the size of the effect due to the exposure. If sicker patients are more likely than healthier patients to take a particular drug, the relatively poor outcomes of those who took the drug…
Original Post: Visualizing how confounding biases estimates of population-wide (or marginal) average causal effects

Thinking about different ways to analyze sub-groups in an RCT

Here’s the scenario: we have an intervention that we think will improve outcomes for a particular population. Furthermore, there are two sub-groups (let’s say defined by which of two medical conditions each person in the population has) and we are interested in knowing if the intervention effect is different for each sub-group. And here’s the question: what is the ideal way to set up a study so that we can assess (1) the intervention effects on the group as a whole, but also (2) the sub-group specific intervention effects? This is a pretty straightforward, text-book scenario. Sub-group analysis is common in many areas of research, including health services research where I do most of my work. It is definitely an advantage to know ahead of time if you want to do a sub-group analysis, as you would in designing a…
Original Post: Thinking about different ways to analyze sub-groups in an RCT

Who knew likelihood functions could be so pretty?

I just released a new iteration of simstudy (version 0.1.6), which fixes a bug or two and adds several spline related routines (available on CRAN). The previous post focused on using spline curves to generate data, so I won’t repeat myself here. And, apropos of nothing really – I thought I’d take the opportunity to do a simple simulation to briefly explore the likelihood function. It turns out if we generate lots of them, it can be pretty, and maybe provide a little insight. If a probability density (or mass) function is more or less forward-looking – answering the question of what is the probability of seeing some future outcome based on some known probability model, the likelihood function is essentially backward-looking. The likelihood takes the data as given or already observed – and allows us to assess how likely…
Original Post: Who knew likelihood functions could be so pretty?

Can we use B-splines to generate non-linear data?

I’m exploring the idea of adding a function or set of functions to the simstudy package that would make it possible to easily generate non-linear data. One way to do this would be using B-splines. Typically, one uses splines to fit a curve to data, but I thought it might be useful to switch things around a bit to use the underlying splines to generate data. This would facilitate exploring models where we know the assumption of linearity is violated. It would also make it easy to explore spline methods, because as with any other simulated data set, we would know the underlying data generating process. Splines in R The bs function in the splines package, returns values from these basis functions based on the specification of knots and degree of curvature. I wrote a wrapper function that uses the…
Original Post: Can we use B-splines to generate non-linear data?