This site is a compendium of R code meant to highlight the various uses of simulation to aid in the understanding of probability, statistics, and study design. I frequently draw on examples using my R package simstudy. Occasionally, I opine on other topics related to causal inference, evidence, and research more generally.

The design effect of a cluster randomized trial with baseline measurements

Is it possible to reduce the sample size requirements of a stepped wedge cluster randomized trial simply by collecting baseline information? In a trial with randomization at the individual level, it is generally the case that if we are able to measure an outcome for subjects at two time periods, first at baseline and then at follow-up, we can reduce the overall sample size. But does this extend to (a) cluster randomized trials generally, and to (b) stepped wedge designs more specifically?

[Read More]

simstudy update: adding flexibility to data generation

A new version of simstudy (0.3.0) is now available on CRAN and on the package website. Along with some less exciting bug fixes, we have added capabilities to a few existing features: double-dot variable reference, treatment assignment, and categorical data definition. These simple additions should make the data generation process a little smoother and more flexible.

Using non-scalar double-dot variable reference

Double-dot notation was introduced in the last version of simstudy to allow data definitions to be more dynamic. Previously, the double-dot variable could only be a scalar value, and with the current version, double-dot notation is now also array-friendly.

[Read More]
R  simstudy 

Sample size requirements for a Bayesian factorial study design

How do you determine sample size when the goal of a study is not to conduct a null hypothesis test but to provide an estimate of multiple effect sizes? I needed to get a handle on this for a recent grant submission, which I’ve been writing about over the past month, here and here. (I provide a little more context for all of this in those earlier posts.) The statistical inference in the study will be based on the estimated posterior distributions from a Bayesian model, so it seems like we’d like those distributions to be as informative as possible. We need to set the sample size large enough to reduce the dispersion of those distributions to a helpful level.

[Read More]

A Bayesian analysis of a factorial design focusing on effect size estimates

Factorial study designs present a number of analytic challenges, not least of which is how to best understand whether simultaneously applying multiple interventions is beneficial. Last time I presented a possible approach that focuses on estimating the variance of effect size estimates using a Bayesian model. The scenario I used there focused on a hypothetical study evaluating two interventions with four different levels each. This time around, I am considering a proposed study to reduce emergency department (ED) use for patients living with dementia that I am actually involved with. This study would have three different interventions, but only two levels for each (i.e., yes or no), for a total of 8 arms. In this case - the model I proposed previously does not seem like it would work well; the posterior distributions based on the variance-based model turn out to be bi-modal in shape, making it quite difficult to interpret the findings. So, I decided to turn the focus away from variance and emphasize the effect size estimates for each arm compared to control.

[Read More]

Analyzing a factorial design by focusing on the variance of effect sizes

Way back in 2018, long before the pandemic, I described a soon-to-be implemented simstudy function genMultiFac that facilitates the generation of multi-factorial study data. I followed up that post with a description of how we can use these types of efficient designs to answer multiple questions in the context of a single study.

Fast forward three years, and I am thinking about these designs again for a new grant application that proposes to study simultaneously three interventions aimed at reducing emergency department (ED) use for people living with dementia. The primary interest is to evaluate each intervention on its own terms, but also to assess whether any combinations seem to be particularly effective. While this will be a fairly large cluster randomized trial with about 80 EDs being randomized to one of the 8 possible combinations, I was concerned about our ability to estimate the interaction effects of multiple interventions with sufficient precision to draw useful conclusions, particularly if the combined effects of two or three interventions are less than additive. (That is, two interventions may be better than one, but not twice as good.)

[Read More]

Drawing the wrong conclusion about subgroups: a comparison of Bayes and frequentist methods

In the previous post, I simulated data from a hypothetical RCT that had heterogeneous treatment effects across subgroups defined by three covariates. I presented two Bayesian models, a strongly pooled model and an unpooled version, that could be used to estimate all the subgroup effects in a single model. I compared the estimates to a set of linear regression models that were estimated for each subgroup separately.

My goal in doing these comparisons is to see how often we might draw the wrong conclusion about subgroup effects when we conduct these types of analyses. In a typical frequentist framework, the probability of making a mistake is usually considerably greater than the 5% error rate that we allow ourselves, because conducting multiple tests gives us more chances to make a mistake. By using Bayesian hierarchical models that share information across subgroups and more reasonably measure uncertainty, I wanted to see if we can reduce the chances of drawing the wrong conclusions.

[Read More]

Subgroup analysis using a Bayesian hierarchical model

I’m part of a team that recently submitted the results of a randomized clinical trial for publication in a journal. The overall findings of the study were inconclusive, and we certainly didn’t try to hide that fact in our paper. Of course, the story was a bit more complicated, as the RCT was conducted during various phases of the COVID-19 pandemic; the context in which the therapeutic treatment was provided changed over time. In particular, other new treatments became standard of care along the way, resulting in apparent heterogeneous treatment effects for the therapy we were studying. It appears as if the treatment we were studying might have been effective only in one period when alternative treatments were not available. While we planned to evaluate the treatment effect over time, it was not our primary planned analysis, and the journal objected to the inclusion of the these secondary analyses.

[Read More]

Posterior probability checking with rvars: a quick follow-up

This is a relatively brief addendum to last week’s post, where I described how the rvar datatype implemented in the R package posterior makes it quite easy to perform posterior probability checks to assess goodness of fit. In the initial post, I generated data from a linear model and estimated parameters for a linear regression model, and, unsurprisingly, the model fit the data quite well. When I introduced a quadratic term into the data generating process and fit the same linear model (without a quadratic term), equally unsurprising, the model wasn’t a great fit.

[Read More]

Fitting your model is only the beginning: Bayesian posterior probability checks with rvars

Say we’ve collected data and estimated parameters of a model that give structure to the data. An important question to ask is whether the model is a reasonable approximation of the true underlying data generating process. If we did a good job, we should be able to turn around and generate data from the model itself that looks similar to the data we started with. And if we didn’t do such a great job, the newly generated data will diverge from the original.

[Read More]

Estimating a risk difference (and confidence intervals) using logistic regression

The odds ratio (OR) – the effect size parameter estimated in logistic regression – is notoriously difficult to interpret. It is a ratio of two quantities (odds, under different conditions) that are themselves ratios of probabilities. I think it is pretty clear that a very large or small OR implies a strong treatment effect, but translating that effect into a clinical context can be challenging, particularly since ORs cannot be mapped to unique probabilities.

[Read More]
R