This site is a compendium of R code meant to highlight the various uses of simulation to aid in the understanding of probability, statistics, and study design. I frequently draw on examples using my R package simstudy. Occasionally, I opine on other topics related to causal inference, evidence, and research more generally.

A new simstudy function to make simulating replications easier

Four years ago, I described a simple framework for organizing simulations to conduct power analyses or explore the operating characteristics of modeling approaches. In that framework, I introduced a small function scenario_list that generated a list of scenarios forming the basis for simulations. I had always intended to incorporate that function into simstudy, and now I have finally done so The new function is available as of version 0.9.0.

This post offers a brief introduction to the function and concludes with a small simulation.

[Read More]

Planning for a 3-arm cluster randomized trial with a nested intervention and a time-to-event outcome

A researcher recently approached me for advice on a cluster-randomized trial he is developing. He is interested in testing the effectiveness of two interventions and wondered whether a 2×2 factorial design might be the best approach.

As we discussed the interventions (I’ll call them AA and BB), it became clear that AA was the primary focus. Intervention BB might enhance the effectiveness of AA, but on its own, BB was not expected to have much impact. (It’s also possible that AA alone doesn’t work, but once BB is in place, the combination may reap benefits.) Given this, it didn’t seem worthwhile to randomize clinics or providers to receive B alone. We agreed that a three-arm cluster-randomized trial—with (1) control, (2) AA alone, and (3) A+BA + B—would be a more efficient and relevant design.

[Read More]

Bayesian proportional hazards model for a stepped-wedge design

We’ve finally reached the end of the road. This is the fifth and last post in a series building up to a Bayesian proportional hazards model for analyzing a stepped-wedge cluster-randomized trial. If you are just joining in, you may want to start at the beginning.

The model presented here integrates non-linear time trends and cluster-specific random effects—elements we’ve previously explored in isolation. There’s nothing fundamentally new in this post; it brings everything together. Given that the groundwork has already been laid, I’ll keep the commentary brief and focus on providing the code.

[Read More]

A Bayesian proportional hazards model for a cluster randomized trial

In recent posts, I introduced a Bayesian approach to proportional hazards modeling and then extended it to incorporate a penalized spline. (There was also a third post on handling ties when multiple individuals share the same event time.) This post describes another extension: a random effect to account for clustering in a cluster randomized trial. With this in place, I’ll be ready to tackle the final step—building a model for analyzing a stepped-wedge cluster-randomized trial that incorporates both splines and site-specific random effects.

[Read More]

Accounting for ties in a Bayesian proportional hazards model

Over my past few posts, I’ve been progressively building towards a Bayesian model for a stepped-wedge cluster randomized trial with a time-to-event outcome, where time will be modeled using a spline function. I started with a simple Cox proportional hazards model for a traditional RCT, ignoring time as a factor. In the next post, I introduced a nonlinear time effect. For the third post—one I initially thought was ready to publish—I extended the model to a cluster randomized trial without explicitly incorporating time. I was then working on the grand finale, the full model, when I ran into an issue: I couldn’t recover the effect-size parameter used to generate the data.

[Read More]

A Bayesian proportional hazards model with a penalized spline

In my previous post, I outlined a Bayesian approach to proportional hazards modeling. This post serves as an addendum, providing code to incorporate a spline to model a time-varying hazard ratio non linearly. In a second addendum to come I will present a separate model with a site-specific random effect, essential for a cluster-randomized trial. These components lay the groundwork for analyzing a stepped-wedge cluster-randomized trial, where both splines and site-specific random effects will be integrated into a single model. I plan on describing this comprehensive model in a final post.

[Read More]

Estimating a Bayesian proportional hazards model

A recent conversation with a colleague about a large stepped-wedge design (SW-CRT) cluster randomized trial piqued my interest, because the primary outcome is time-to-event. This is not something I’ve seen before. A quick dive into the literature suggested that time-to-event outcomes are uncommon in SW-CRTs-and that the best analytic approach is not obvious. I was intrigued by how to analyze the data to estimate a hazard ratio while accounting for clustering and potential secular trends that might influence the time to the event.

[Read More]

Thinking about covariates in an analysis of an RCT

I was recently discussing the analytic plan for a randomized controlled trial (RCT) with a clinical collaborator when she asked whether it’s appropriate to adjust for pre-specified baseline covariates. This question is so interesting because it touches on fundamental issues of inference—both causal and statistical. What is the target estimand in an RCT—that is, what effect are we actually measuring? What do we hope to learn from the specific sample recruited for the trial (i.e., how can the findings be analyzed in a way that enhances generalizability)? What underlying assumptions about replicability, resampling, and uncertainty inform the arguments for and against covariate adjustment? These are big questions, which won’t necessarily be answered here, but need to be kept in mind when thinking about the merits of covariate adjustment

[Read More]
R 

Can ChatGPT help construct non-trivial statistical models? An example with Bayesian "random" splines

I’ve been curious to see how helpful ChatGPT can be for implementing relatively complicated models in R. About two years ago, I described a model for estimating a treatment effect in a cluster-randomized stepped wedge trial. We used a generalized additive model (GAM) with site-specific splines to account for general time trends, implemented using the mgcv package. I’ve been interested in exploring a Bayesian version of this model, but hadn’t found the time to try - until I happened to pose this simple question to ChatGPT:

[Read More]

An IV study design to estimate an effect size when randomization is not ethical

An investigator I frequently consult with seeks to estimate the effect of a palliative care treatment protocol for patients nearing end-stage disease, compared to a more standard, though potentially overly burdensome, therapeutic approach. Ideally, we would conduct a two-arm randomized clinical trial (RCT) to create comparable groups and obtain an unbiased estimate of the intervention effect. However, in this case, it may be considered unethical to randomize patients to a non-standard protocol.

[Read More]