An investigator I frequently consult with seeks to estimate the effect of a palliative care treatment protocol for patients nearing end-stage disease, compared to a more standard, though potentially overly burdensome, therapeutic approach. Ideally, we would conduct a two-arm randomized clinical trial (RCT) to create comparable groups and obtain an unbiased estimate of the intervention effect. However, in this case, it may be considered unethical to randomize patients to a non-standard protocol.
Alternatively, we could conduct an observational study, measuring outcomes for patients who choose one of the two protocols. This approach would yield unbiased estimates only if we could collect all relevant data to ensure the absence of unmeasured confounding. While such an analysis could be useful, it’s uncertain whether we could convincingly argue that the estimates are truly unbiased, though we might assess their sensitivity to unmeasured confounding.
A potential middle ground involves randomizing in a way that merely increases the likelihood that a patient will choose the palliative care protocol. If successful, this could allow us to estimate a specific causal effect—the Complier Average Treatment Effect (CATE). The CATE might offer valuable insights into the relative merits of the two approaches. The key advantage of this approach is that it enables an unbiased estimate of the CATE, even in the presence of unmeasured confounding.
My goal here is to introduce you to this instrumental variable (IV) design and present simulations that demonstrate its potential strengths. IV analysis is a widely used estimation method across various fields, including economics, sociology, and epidemiology.
Observational study with no unmeasured confounding
The directed acyclic graph (DAG) below represents an observational dataset with two predictors, \(X\) and \(U\), where only \(X\) is measured. In this scenario, we can estimate the average treatment effect by fitting a regression model that adjusts for \(X\), assuming \(X\) is equally distributed across the palliative care and standard therapy groups. Although UU, the unmeasured covariate, predicts the outcome, it does not predict the selected protocol. Therefore, UU is not a confounder and does not need to be included in the model to ensure an unbiased estimate.
The data definitions for the simulation follow from the DAG. In the simulation, the variable \(b\) is the association between \(U\) and the chosen protocol. In this first case, \(b\) will be set to zero so that \(U\) is only a predictor of the outcome, but not the protocol. In the data generation, we generate potential outcomes \(Y^0\) and \(Y^1\) for each individual. These are the outcomes we would observe for the same individual if they receive the standard therapy or the palliative care protocol, respectively. The causal effect at the individual level is \(Y^1 - Y^0\). (I’ve discussed causal inference elsewhere - for example, here, here, and here. If you want a really good guide - and even includes a chapter on IV - you can’t really do any better than the book Causal Inference: What If by Hernán and Robins and available online.) The observed outcome \(Y\) is the potential outcome for the protocol actually selected. In this case (and all the others that follow), I assume a treatment effect of 2 (with some variation across individuals.)
Here are the necessary libraries for all the simulations that follow and then the data definitions just described:
library(simstudy)
library(data.table)
library(broom)
library(AER)
def <-
defData(varname = "X", formula = 0, variance = 1, dist = "normal") |>
defData(varname = "U", formula = 0, variance = 1, dist = "normal") |>
defData(varname = "P", formula = "-1 + .2*X + ..b*U", dist = "binary", link="logit") |>
defData(varname = "Y0", formula = "X + U", variance = 16, dist = "normal") |>
defData(varname = "Y1", formula = "2 + Y0", variance = 0.75, dist = "normal") |>
defData(varname = "Y", formula = "(P==0)*Y0 + (P==1)*Y1", dist = "nonrandom")
I’m generating a large sample to reduce sample variability:
set.seed(9434)
b <- 0
dd <- genData(10000, def)
dd
## Key: <id>
## id X U P Y0 Y1 Y
## <int> <num> <num> <int> <num> <num> <num>
## 1: 1 0.56455033 -0.3995340 0 -1.129070 -0.4144314 -1.129070
## 2: 2 0.47370692 0.8632443 0 1.198505 4.0695024 1.198505
## 3: 3 0.14111773 -0.5722770 1 2.893212 6.1998168 6.199817
## 4: 4 -0.33594059 0.7847092 0 2.796105 5.1121294 2.796105
## 5: 5 -1.58531504 -0.4670841 1 -4.371836 -2.2362033 -2.236203
## ---
## 9996: 9996 -1.31839289 -0.4082700 0 2.458730 5.4139270 2.458730
## 9997: 9997 -1.37300260 0.9126086 0 3.820652 5.3291210 3.820652
## 9998: 9998 -1.37026038 -0.4491965 0 -1.145322 0.3858421 -1.145322
## 9999: 9999 -2.31914806 0.4899357 0 1.070460 4.4608188 1.070460
## 10000: 10000 -0.04254774 0.6394997 1 2.434553 4.9220870 4.922087
The true average casual effect is close to two, as expected. This is something we cannot observe:
dd[, mean(Y1 - Y0)]
## [1] 1.984264
If we estimate a linear model adjusting for \(X\), the parameter for \(P\) should also be close to two, which it is:
tidy(lm(Y ~ P + X, data = dd))
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0324 0.0492 -0.658 5.10e- 1
## 2 P 2.03 0.0933 21.8 3.96e-103
## 3 X 0.957 0.0418 22.9 8.37e-113
Observational study with unmeasured confounding
The DAG below now has an arrow pointing from \(U\) to the protocol, so \(b\) is non-zero. If we are not able to measure \(U\) and control for it in the model, we will get a biased estimate of the treatment effect.
In this case, the unobserved average causal effect is still very close to two.
set.seed(9434)
b <- 0.2
dd <- genData(10000, def)
dd[, mean(Y1 - Y0)]
## [1] 1.984264
However, if we fit a model without accounting for the unmeasured confounder \(U\), we get a biased (upward) estimate for the effect of the palliative care protocol:
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0699 0.0492 -1.42 1.56e- 1
## 2 P 2.17 0.0933 23.3 1.48e-116
## 3 X 0.952 0.0419 22.8 7.88e-112
Adjusting for \(U\) removes the bias, though this would not be possible if we could not measure \(U\):
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0270 0.0480 -0.563 5.73e- 1
## 2 P 1.99 0.0911 21.8 2.46e-103
## 3 X 0.943 0.0407 23.1 1.79e-115
## 4 U 0.970 0.0411 23.6 4.96e-120
IV design
Since we don’t believe it’s possible to measure all potential confounders, nor do we think conducting a standard RCT is appropriate or feasible, we might want to consider a design that randomizes patients to a setting where they are more likely to choose palliative care over the standard protocol. By establishing a counseling program that encourages patients to opt for palliative care when appropriate, we can use the variation in treatment decisions to estimate a causal effect. Crucially, patients in both the counseling and non-counseling arms can choose either palliative care or the standard protocol; the expectation is that those in the counseling arm will be more likely to choose palliative care.
It’s important to be clear that our interest lies in the effect of palliative care versus the standard protocol, not in the counseling program itself. A direct comparison between the counseling and non-counseling groups would not provide the information we need, as each group is likely to include patients receiving both types of care. Therefore, such a comparison would be ambiguous. Moreover, if we compare palliative care and standard protocol within the counseling arm, we face the same issue of unmeasured confounding as in a purely observational study.
Additionally, comparing patients who opt for palliative care in the counseling arm with those who choose the standard protocol in the non-counseling arm could lead to problems. Some patients in the non-counseling arm may have chosen standard care regardless of their group assignment, making them poor counterfactuals for those in the counseling arm who opted for palliative care. Similarly, some patients in the counseling arm who chose palliative care might have done so even without counseling, so including them in a comparison with non-counseling patients who opted for standard care may not be meaningful.
This situation illustrates a taxonomy of patient behavior essential to instrumental variable (IV) analysis, known as principal stratification. In this framework, patients can be categorized as never-takers (those who choose standard care regardless of their assignment), always-takers (those who choose palliative care regardless), or compliers (those who choose standard care in the non-counseling arm but opt for palliative care in the counseling arm). There is also a fourth category, deniers, which we assume does not exist.
In IV analysis, we compare the compliers across arms, effectively sidestepping the never-takers and always-takers. Although we can’t definitively classify individual patients as compliers or never-takers, we assume that a patient in the non-counseling arm who chooses palliative care is an always-taker. Similarly, we assume that a patient in the counseling arm who opts for standard care would have done so even without counseling.
The instrument in IV analysis is the characteristic or assignment that induces differential distributions of the exposures of interest. In this case, the instrument is the assignment to counseling or not, and the exposure of interest is palliative care vs. the standard protocol. The target estimand in IV analysis is the complier average treatment effect or CATE. CATE can be estimated using IV analysis under a set of four key assumptions. First, we assume monotonicity of behavior - so that there are never-takers, always-takers, and compliers, but no deniers. Second, the instrument must be uncorrelated with the error term of the IV model. Third, the instrument can have no direct effect on the outcome of interest (that is, the effect is only induced through the exposure of interest). Fourth, the instrument must be correlated with the exposure of interest. This means that the probability of choosing palliative care will be different in the counseling and non-counseling arms.
The instrument in IV analysis is the characteristic or assignment that induces different distributions of the exposures of interest—in this case, the assignment to counseling or not, with the exposure of interest being palliative care versus the standard protocol. The target estimand in IV analysis is the Complier Average Treatment Effect (CATE). CATE can be estimated using IV analysis under four key assumptions:
- Monotonicity: There are never-takers, always-takers, and compliers, but no deniers.
- Independence: The instrument must be uncorrelated with the error term of the IV model.
- Exclusion Restriction: The instrument can have no direct effect on the outcome of interest; its effect must be solely through the exposure of interest.
- Relevance: The instrument must be correlated with the exposure of interest, meaning the probability of choosing palliative care differs between the counseling and non-counseling arms.
Here is the DAG that represents the last three assumptions (but not the monotinicity assumption):
The simulation for this scenario is a little more involved, because we need to generate outcomes depending on the stratum an individual falls in, and we need to generate the palliative care exposure differentially. The way I have implemented this is to generate \(P^0\), the potential exposure assuming an individual is randomized not to receive counseling. The probability of opting for palliative care is very low, less than 10%:
def_p0 <-
defData(varname = "A", formula = "1;1", dist = "trtAssign") |>
defData(varname = "X", formula = 0, variance =1, dist = "normal") |>
defData(varname = "U", formula = 0, variance =1, dist = "normal") |>
defData(varname = "P0", formula = "-2.5 + 0.2*X + 0.2*U", dist = "binary", link="logit")
set.seed(7271)
dd <- genData(10000, def_p0)
dd
## Key: <id>
## id A X U P0
## <int> <int> <num> <num> <int>
## 1: 1 1 -0.05885615 1.6427735 0
## 2: 2 1 -1.30383586 0.3174614 0
## 3: 3 1 0.61268664 -1.5143240 0
## 4: 4 0 2.07869248 0.4865766 0
## 5: 5 1 -0.43060933 -1.3370887 0
## ---
## 9996: 9996 0 -0.88119577 0.3312608 0
## 9997: 9997 1 -0.90726045 0.8927394 0
## 9998: 9998 1 -0.69425529 -1.4237684 0
## 9999: 9999 0 1.37659232 -0.4990659 0
## 10000: 10000 0 -0.71196863 0.2136513 0
dd[, mean(P0)]
## [1] 0.084
Next we generate \(P^1\), the potential exposure when randomized to counseling.
def_p1 <-
defCondition(
condition = "P0 == 0",
formula = "0.5 + X + U",
dist = "binary",
link = "logit"
) |>
defCondition(
condition = "P0 == 1",
formula = 1,
dist = "nonrandom"
)
dd <- addCondition(def_p1, dd, newvar = "P1")
dd
## Key: <id>
## id P1 A X U P0
## <int> <num> <int> <num> <num> <int>
## 1: 1 1 1 -0.05885615 1.6427735 0
## 2: 2 1 1 -1.30383586 0.3174614 0
## 3: 3 0 1 0.61268664 -1.5143240 0
## 4: 4 1 0 2.07869248 0.4865766 0
## 5: 5 1 1 -0.43060933 -1.3370887 0
## ---
## 9996: 9996 1 0 -0.88119577 0.3312608 0
## 9997: 9997 0 1 -0.90726045 0.8927394 0
## 9998: 9998 0 1 -0.69425529 -1.4237684 0
## 9999: 9999 1 0 1.37659232 -0.4990659 0
## 10000: 10000 1 0 -0.71196863 0.2136513 0
The probability of choosing palliative care is much higher under counseling (and is in fact 100% when \(P^0 = 1\)):
dd[, mean(P1), keyby = P0]
## Key: <P0>
## P0 V1
## <int> <num>
## 1: 0 0.5803493
## 2: 1 1.0000000
This next step is a little less intuitive. I’m first generating interim potential outcomes \(Q^0\) (for \(Y^0\)) and \(Q^1\) (for \(Y^1\)). \(Q^0\) depends on \(X\) and \(U\), and \(Q^1\) is centered around \(2 + Q^0\). The actual potential outcomes \(Y^0\) and \(Y^1\) depend on the status of \(P^0\) and \(P^1\), respectively. If \(P^0 = 0\) then \(Y^0\) takes on the value of \(Q^0\), but if \(P^0 = 1\) then \(Y^0\) takes on the value of \(Q^1\). The same logic defines \(Y^1\).
def_A <-
defDataAdd(varname = "Q0", formula = "X + U", variance = 16, dist = "normal") |>
defDataAdd(varname = "Q1", formula = "2 + Q0", variance = 0.75, dist = "normal") |>
defDataAdd(varname = "Y0", formula = "Q0*(P0==0) + Q1*(P0==1)", dist = "nonrandom") |>
defDataAdd(varname = "Y1", formula = "Q0*(P1==0) + Q1*(P1==1)", dist = "nonrandom") |>
defDataAdd(varname = "P", formula = "(A==0)*P0 + (A==1)*P1", dist = "nonrandom") |>
defDataAdd(varname = "Y", formula = "(P==0)*Y0 + (P==1)*Y1", dist = "nonrandom")
dd <- addColumns(def_A, dd)
dd[, .(A, P0, P1, P, Q0 = round(Q0, 2), Q1 = round(Q1, 2),
Y0 = round(Y0, 2), Y1 = round(Y1, 2), Y = round(Y, 2))]
## A P0 P1 P Q0 Q1 Y0 Y1 Y
## <int> <int> <num> <num> <num> <num> <num> <num> <num>
## 1: 1 0 1 1 -2.51 -1.21 -2.51 -1.21 -1.21
## 2: 1 0 1 1 -2.62 -1.59 -2.62 -1.59 -1.59
## 3: 1 0 0 0 -1.24 0.67 -1.24 -1.24 -1.24
## 4: 0 0 1 0 2.05 3.13 2.05 3.13 2.05
## 5: 1 0 1 1 2.49 4.32 2.49 4.32 4.32
## ---
## 9996: 0 0 1 0 0.95 3.29 0.95 3.29 0.95
## 9997: 1 0 0 0 1.34 2.49 1.34 1.34 1.34
## 9998: 1 0 0 0 2.66 7.25 2.66 2.66 2.66
## 9999: 0 0 1 0 1.59 3.93 1.59 3.93 1.59
## 10000: 0 0 1 0 1.79 3.49 1.79 3.49 1.79
This data generating process forces the causal effect of palliative care to be zero for never-takers and always-takers, and averages 2 for compliers.
dd[, mean(Y1 - Y0), keyby = .(P0, P1)]
## Key: <P0, P1>
## P0 P1 V1
## <int> <num> <num>
## 1: 0 0 0.000000
## 2: 0 1 1.999411
## 3: 1 1 0.000000
A model that adjusts for observed covariates only will provide a biased estimate for the effect of palliative care (in this case, leading to an overestimate):
tidy(lm(Y ~ P + X, data = dd))
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.124 0.0518 -2.40 1.66e- 2
## 2 P 2.40 0.0894 26.8 6.65e-153
## 3 X 0.944 0.0427 22.1 1.03e-105
However, if we are able to measure and adjust for all covariates, we will get an unbiased estimate:
tidy(lm(Y ~ P + X + U, data = dd))
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.00817 0.0509 0.161 8.72e- 1
## 2 P 1.98 0.0891 22.3 2.07e-107
## 3 X 0.988 0.0417 23.7 1.08e-120
## 4 U 0.944 0.0417 22.6 1.33e-110
Although we cannot actually measure \(U\) (i.e., it is not observed), we can use IV estimation to get an unbiased estimate of the CATE. Note that \(A\) is introduced into the regression equation:
tidy(ivreg(Y ~ P + X | A + X, data = dd))
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.0203 0.0686 0.296 7.67e- 1
## 2 P 1.98 0.158 12.5 1.56e- 35
## 3 X 0.983 0.0445 22.1 8.17e-106
To get a more robust assessment of the analytic methods, I’ve conducted a simple experiment that generates 1000 smaller data sets of 500 individuals each. For each generated data set, I am recording
- overall_po: average overall causal effect of never-takers, always-takers, and compliers combined based on the unobserved potential outcomes
- complier_po: compliers-only causal effect based on the potential outcomes
- overall: overall observed effect without any adjustment
- within: comparison of palliative care vs the standard protocol for patients who received counselling
- cross: comparison of patients who received counseling and opted for palliative care with patients who did not receive counseling and received the standard protocol
- lm_x: estimated effect adjusting for \(X\) alone
- lm_xu: estimated effect adjusting for both \(X\) and \(U\)
- iv: IV estimate
gen_ests <- function(n) {
dd <- genData(n, def_p0)
dd <- addCondition(def_p1, dd, newvar = "P1")
dd <- addColumns(def_A, dd)
overall_po <- dd[,mean(Y1-Y0)]
dx <- dd[P0 == 0 & P1 == 1]
complier_po <- dx[,mean(Y1-Y0)]
overall <- dd[P == 1, mean(Y)] - dd[P == 0, mean(Y)]
within <- dd[P == 1 & A == 1, mean(Y)] - dd[P == 0 & A == 1, mean(Y)]
cross <- dd[P == 1 & A == 1, mean(Y)] - dd[P == 0 & A == 0, mean(Y)]
lm_x <- coef(lm(Y ~ P + X, data = dd))["P"]
lm_xu <- coef(lm(Y ~ P + X + U, data = dd))["P"]
iv <- coef(ivreg(Y ~ P + X | A + X, data = dd))["P"]
data.table(overall_po, complier_po, overall, within, cross, lm_x, lm_xu, iv)
}
res <- rbindlist(lapply(1:1000, function(x) gen_ests(500)))
Below, the means and standard deviations of the estimates across all 1000 iterations are shown. As expected, all estimates are biased except for the complier average treatment effect based on the potential outcomes, the linear model adjusting for both \(X\) and \(U\), and the IV estimate:
lapply(res, function(x) c( round(mean(x), 2), round(sd(x), 2) ))
## $overall_po
## [1] 1.08 0.05
##
## $complier_po
## [1] 2.00 0.05
##
## $overall
## [1] 2.80 0.39
##
## $within
## [1] 3.44 0.54
##
## $cross
## [1] 2.56 0.43
##
## $lm_x
## [1] 2.41 0.39
##
## $lm_xu
## [1] 1.99 0.39
##
## $iv
## [1] 1.97 0.68
Caveats
While IV estimation works well in this idealized setting, there are some key limitations worth noting. First, consider the Complier Average Treatment Effect (CATE). Is the CATE truly what we are interested in? If most patients are compliers, then perhaps it is, but if they represent only a small proportion of the population, the usefulness of the information becomes less clear. We are likely more interested in understanding the effect of an approach that will be acceptable to a significant portion of eligible patients. The size of this “significant portion” will depend on the context—perhaps 25% is sufficient in some cases, while in others, we may require closer to 75%.
The second issue is the assumption that counseling has no direct effect on the outcome. Although I’ve been vague about the outcome in question, it’s crucial to choose an outcome carefully to ensure that the act of receiving counseling, independent of the protocol selected, does not influence the outcome. If this assumption is violated, the IV estimate will no longer be unbiased for the CATE.
Reference:
Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.