Recently, we were planning a study to evaluate the effect of an intervention on outcomes for very sick patients who show up in the emergency department. My collaborator had concerns about a phenomenon that she had observed in other studies that might affect the results – patients measured earlier in the study tend to be sicker than those measured later in the study. This might not be a problem, but in the context of a stepped-wedge study design (see this for a discussion that touches this type of study design), this could definitely generate biased estimates: when the intervention occurs later in the study (as it does in a stepped-wedge design), the “exposed” and “unexposed” populations could differ, and in turn so could the outcomes. We might confuse an artificial effect as an intervention effect.What could explain this phenomenon? The…

Original Post: Should we be concerned about incidence – prevalence bias?

# Posts by Keith Goldfeld

## Using simulation for power analysis: an example based on a stepped wedge study design

Simulation can be super helpful for estimating power or sample size requirements when the study design is complex. This approach has some advantages over an analytic one (i.e. one based on a formula), particularly the flexibility it affords in setting up the specific assumptions in the planned study, such as time trends, patterns of missingness, or effects of different levels of clustering. A downside is certainly the complexity of writing the code as well as the computation time, which can be a bit painful. My goal here is to show that at least writing the code need not be overwhelming.Recently, I was helping an investigator plan a stepped wedge cluster randomized trial to study the effects of modifying a physician support system on patient-level diabetes management. While analytic approaches for power calculations do exist in the context of this complex…

Original Post: Using simulation for power analysis: an example based on a stepped wedge study design

## simstudy update: two new functions that generate correlated observations from non-normal distributions

In an earlier post, I described in a fair amount of detail an algorithm to generate correlated binary or Poisson data. I mentioned that I would be updating simstudy with functions that would make generating these kind of data relatively painless. Well, I have managed to do that, and the updated package (version 0.1.3) is available for download from CRAN. There are now two additional functions to facilitate the generation of correlated data from binomial, poisson, gamma, and uniform distributions: genCorGen and addCorGen. Here’s a brief intro to these functions. Generate generally correlated data genCorGen is an extension of genCorData, which was provided in earlier versions of simstudy to generate multivariate normal data. In the first example below, we are generating data from a multivariate Poisson distribution. To do this, we need to specify the mean of the Poisson…

Original Post: simstudy update: two new functions that generate correlated observations from non-normal distributions

## Copulas and correlated data generation: getting beyond the normal distribution

Using the simstudy package, it’s possible to generate correlated data from a normal distribution using the function genCorData. I’ve wanted to extend the functionality so that we can generate correlated data from other sorts of distributions; I thought it would be a good idea to begin with binary and Poisson distributed data, since those come up so frequently in my work. simstudy can already accommodate more general correlated data, but only in the context of a random effects data generation process. This might not be what we want, particularly if we are interested in explicitly generating data to explore marginal models (such as a GEE model) rather than a conditional random effects model (a topic I explored in my previous discussion). The extension can quite easily be done using copulas. Based on this definition, a copula is a “multivariate…

Original Post: Copulas and correlated data generation: getting beyond the normal distribution

## It can be easy to explore data generating mechanisms with the simstudy package

I learned statistics and probability by simulating data. Sure, I battled my way through proofs, but I never believed the results until I saw it in a simulation. I guess I have it backwards, it worked for me. And now that I do this for a living, I continue to use simulation to understand models, to do sample size estimates and power calculations, and of course to teach. Sure – I’ll use the occasional formula, but I always feel the need to check it with simulation. It’s just the way I am. Since I found myself constantly setting up simulations, over time I developed ways to make the process a bit easier. Those processes turned into a package, which I called simstudy, or simulating study data. My goal here is to introduce the basic idea behind simstudy, and provide a…

Original Post: It can be easy to explore data generating mechanisms with the simstudy package