Can ChatGPT help construct non-trivial statistical models? An example with Bayesian "random" splines

I’ve been curious to see how helpful ChatGPT can be for implementing relatively complicated models in R. About two years ago, I described a model for estimating a treatment effect in a cluster-randomized stepped wedge trial. We used a generalized additive model (GAM) with site-specific splines to account for general time trends, implemented using the mgcv package. I’ve been interested in exploring a Bayesian version of this model, but hadn’t found the time to try - until I happened to pose this simple question to ChatGPT: [Read More]

Including uncertainty when comparing response rates across clusters

Since this is a holiday weekend here in the US, I thought I would write up something relatively short and simple since I am supposed to be relaxing. A few weeks ago, someone presented me with some data that showed response rates to a survey that was conducted at about 30 different locations. The team that collected the data was interested in understanding if there were some sites that had response rates that might have been too low. [Read More]

Skeptical Bayesian priors might help minimize skepticism about subgroup analyses

Over the past couple of years, I have been working with an amazing group of investigators as part of the CONTAIN trial to study whether COVID-19 convalescent plasma (CCP) can improve the clinical status of patients hospitalized with COVID-19 and requiring noninvasive supplemental oxygen. This was a multi-site study in the US that randomized 941 patients to either CCP or a saline solution placebo. The overall findings suggest that CCP did not benefit the patients who received it, but if you drill down a little deeper, the story may be more complicated than that. [Read More]

Controlling Type I error in RCTs with interim looks: a Bayesian perspective

Recently, a colleague submitted a paper describing the results of a Bayesian adaptive trial where the research team estimated the probability of effectiveness at various points during the trial. This trial was designed to stop as soon as the probability of effectiveness exceeded a pre-specified threshold. The journal rejected the paper on the grounds that these repeated interim looks inflated the Type I error rate, and increased the chances that any conclusions drawn from the study could have been misleading. [Read More]

Sample size requirements for a Bayesian factorial study design

How do you determine sample size when the goal of a study is not to conduct a null hypothesis test but to provide an estimate of multiple effect sizes? I needed to get a handle on this for a recent grant submission, which I’ve been writing about over the past month, here and here. (I provide a little more context for all of this in those earlier posts.) The statistical inference in the study will be based on the estimated posterior distributions from a Bayesian model, so it seems like we’d like those distributions to be as informative as possible. [Read More]

A Bayesian analysis of a factorial design focusing on effect size estimates

Factorial study designs present a number of analytic challenges, not least of which is how to best understand whether simultaneously applying multiple interventions is beneficial. Last time I presented a possible approach that focuses on estimating the variance of effect size estimates using a Bayesian model. The scenario I used there focused on a hypothetical study evaluating two interventions with four different levels each. This time around, I am considering a proposed study to reduce emergency department (ED) use for patients living with dementia that I am actually involved with. [Read More]

Analyzing a factorial design by focusing on the variance of effect sizes

Way back in 2018, long before the pandemic, I described a soon-to-be implemented simstudy function genMultiFac that facilitates the generation of multi-factorial study data. I followed up that post with a description of how we can use these types of efficient designs to answer multiple questions in the context of a single study. Fast forward three years, and I am thinking about these designs again for a new grant application that proposes to study simultaneously three interventions aimed at reducing emergency department (ED) use for people living with dementia. [Read More]

Drawing the wrong conclusion about subgroups: a comparison of Bayes and frequentist methods

In the previous post, I simulated data from a hypothetical RCT that had heterogeneous treatment effects across subgroups defined by three covariates. I presented two Bayesian models, a strongly pooled model and an unpooled version, that could be used to estimate all the subgroup effects in a single model. I compared the estimates to a set of linear regression models that were estimated for each subgroup separately. My goal in doing these comparisons is to see how often we might draw the wrong conclusion about subgroup effects when we conduct these types of analyses. [Read More]

Subgroup analysis using a Bayesian hierarchical model

I’m part of a team that recently submitted the results of a randomized clinical trial for publication in a journal. The overall findings of the study were inconclusive, and we certainly didn’t try to hide that fact in our paper. Of course, the story was a bit more complicated, as the RCT was conducted during various phases of the COVID-19 pandemic; the context in which the therapeutic treatment was provided changed over time. [Read More]

Posterior probability checking with rvars: a quick follow-up

This is a relatively brief addendum to last week’s post, where I described how the rvar datatype implemented in the R package posterior makes it quite easy to perform posterior probability checks to assess goodness of fit. In the initial post, I generated data from a linear model and estimated parameters for a linear regression model, and, unsurprisingly, the model fit the data quite well. When I introduced a quadratic term into the data generating process and fit the same linear model (without a quadratic term), equally unsurprising, the model wasn’t a great fit. [Read More]