Accounting for ties in a Bayesian proportional hazards model

Over my past few posts, I’ve been progressively building towards a Bayesian model for a stepped-wedge cluster randomized trial with a time-to-event outcome, where time will be modeled using a spline function. I started with a simple Cox proportional hazards model for a traditional RCT, ignoring time as a factor. In the next post, I introduced a nonlinear time effect. For the third post—one I initially thought was ready to publish—I extended the model to a cluster randomized trial without explicitly incorporating time. [Read More]

A Bayesian proportional hazards model with a penalized spline

In my previous post, I outlined a Bayesian approach to proportional hazards modeling. This post serves as an addendum, providing code to incorporate a spline to model a time-varying hazard ratio non linearly. In a second addendum to come I will present a separate model with a site-specific random effect, essential for a cluster-randomized trial. These components lay the groundwork for analyzing a stepped-wedge cluster-randomized trial, where both splines and site-specific random effects will be integrated into a single model. [Read More]

Estimating a Bayesian proportional hazards model

A recent conversation with a colleague about a large stepped-wedge design (SW-CRT) cluster randomized trial piqued my interest, because the primary outcome is time-to-event. This is not something I’ve seen before. A quick dive into the literature suggested that time-to-event outcomes are uncommon in SW-CRTs-and that the best analytic approach is not obvious. I was intrigued by how to analyze the data to estimate a hazard ratio while accounting for clustering and potential secular trends that might influence the time to the event. [Read More]

Thinking about covariates in an analysis of an RCT

I was recently discussing the analytic plan for a randomized controlled trial (RCT) with a clinical collaborator when she asked whether it’s appropriate to adjust for pre-specified baseline covariates. This question is so interesting because it touches on fundamental issues of inference—both causal and statistical. What is the target estimand in an RCT—that is, what effect are we actually measuring? What do we hope to learn from the specific sample recruited for the trial (i. [Read More]

Can ChatGPT help construct non-trivial statistical models? An example with Bayesian "random" splines

I’ve been curious to see how helpful ChatGPT can be for implementing relatively complicated models in R. About two years ago, I described a model for estimating a treatment effect in a cluster-randomized stepped wedge trial. We used a generalized additive model (GAM) with site-specific splines to account for general time trends, implemented using the mgcv package. I’ve been interested in exploring a Bayesian version of this model, but hadn’t found the time to try - until I happened to pose this simple question to ChatGPT: [Read More]

An IV study design to estimate an effect size when randomization is not ethical

An investigator I frequently consult with seeks to estimate the effect of a palliative care treatment protocol for patients nearing end-stage disease, compared to a more standard, though potentially overly burdensome, therapeutic approach. Ideally, we would conduct a two-arm randomized clinical trial (RCT) to create comparable groups and obtain an unbiased estimate of the intervention effect. However, in this case, it may be considered unethical to randomize patients to a non-standard protocol. [Read More]

Generating binary data by specifying the relative risk, with simulations

The most traditional approach for analyzing binary outcome data is logistic regression, where the estimated parameters are interpreted as log odds ratios or, if exponentiated, as odds ratios (ORs). No one other than statisticians (and maybe not even statisticians) finds the odds ratio to be a very intuitive statistic, and many feel that a risk difference or risk ratio/relative risks (RRs) are much more interpretable. Indeed, there seems to be a strong belief that readers will, more often than not, interpret odds ratios as risk ratios. [Read More]

simstudy: another way to generate data from a non-standard density

One of my goals for the simstudy package is to make it as easy as possible to generate data from a wide range of data distributions. The recent update created the possibility of generating data from a customized distribution specified in a user-defined function. Last week, I added two functions, genDataDist and addDataDist, that allow data generation from an empirical distribution defined by a vector of integers. (See here for how to download latest development version. [Read More]
simstudy 0.8.0: customized distributions

Over the past few years, a number of folks have asked if simstudy accommodates customized distributions. There’s been interest in truncated, zero-inflated, or even more standard distributions that haven’t been implemented in simstudy. While I’ve come up with approaches for some of the specific cases, I was never able to develop a general solution that could provide broader flexibility. This shortcoming changes with the latest version of simstudy, now available on CRAN. [Read More]
simstudy enhancement: specifying idiosyncratic follow-up times for longitudinal data

A researcher reached out to me a few weeks ago. They were trying to generate longitudinal data that included irregularly spaced follow-up periods. The default periods generated by the function addPeriods in the simstudy package are {0,1,2,...,n1}\{0, 1, 2, ..., n - 1\}, where there are nn total periods. However, when follow-up periods required more specificity, such as {0,90,180,365}\{0, 90, 180, 365\} days from baseline, users had to manually add them. [Read More]