Considering sensitivity to unmeasured confounding: part 2

In part 1 of this 2-part series, I introduced the notion of sensitivity to unmeasured confounding in the context of an observational data analysis. I argued that an estimate of an association between an observed exposure DD and outcome YY is sensitive to unmeasured confounding if we can conceive of a reasonable alternative data generating process (DGP) that includes some unmeasured confounder that will generate the same observed distribution the observed data. I further argued that reasonableness can be quantified or parameterized by the two correlation coefficients ρUD\rho_{UD} and ρUY\rho_{UY}, which measure the strength of the relationship of the unmeasured confounder UU with each of the observed measures. Alternative DGPs that are characterized by high correlation coefficients can be viewed as less realistic, and the observed data could be considered less sensitive to unmeasured confounding. On the other hand, DGPs characterized by lower correlation coefficients would be considered more sensitive.

I need to pause here for a moment to point out that something similar has been described much more thoroughly by a group at NYU’s PRIISM (see Carnegie, Harada & Hill and Dorie et al). In fact, this group of researchers has actually created an R package called treatSens to facilitate sensitivity analysis. I believe the discussion in these posts here is consistent with the PRIISM methodology, except treatSens is far more flexible (e.g. it can handle binary exposures) and provides more informative output than what I am describing. I am hoping that the examples and derivation of an equivalent DGP that I show here provide some additional insight into what sensitivity means.

I’ve been wrestling with these issues for a while, but the ideas for the derivation of an alternative DGP were actually motivated by this recent note by Fei Wan on an unrelated topic. (Wan shows how a valid instrumental variable may appear to violate a key assumption even though it does not.) The key element of Wan’s argument for my purposes is how the coefficient estimates of an observed model relate to the coefficients of an alternative (possibly true) data generation process/model.

OK - now we are ready to walk through the derivation of alternative DGPs for an observed data set.

Two DGPs, same data

Recall from Part 1 that we have an observed data model

Y=k0+k1D+ϵY Y = k_0 + k_1D + \epsilon_Y where ϵYN(0,σY2)\epsilon_Y \sim N\left(0, \sigma^2_Y\right). We are wondering if there is another DGP that could have generated the data that we have actually observed:

D=α0+α1U+ϵDY=β0+β1D+β2U+ϵY, \begin{aligned} D &= \alpha_0 + \alpha_1 U + \epsilon_D \\ Y &= \beta_0 + \beta_1 D + \beta_2 U + \epsilon_{Y^*}, \end{aligned}

where UU is some unmeasured confounder, and ϵDN(0,σD2)\epsilon_D \sim N\left(0, \sigma^2_D\right) and ϵYN(0,σY2)\epsilon_{Y^*} \sim N\left(0, \sigma^2_{Y^*}\right). Can we go even further and find an alternative DGP where DD has no direct effect on YY at all?

D=α0+α1U+ϵDY=β0+β2U+ϵY, \begin{aligned} D &= \alpha_0 + \alpha_1 U + \epsilon_D \\ Y &= \beta_0 + \beta_2 U + \epsilon_{Y^*}, \end{aligned}

α1\alpha_1 (and σϵD2\sigma_{\epsilon_D}^2) derived from ρUD\rho_{UD}

In a simple linear regression model with a single predictor, the coefficient α1\alpha_1 can be specified directly in terms ρUD\rho_{UD}, the correlation between UU and DD:

α1=ρUDσDσU \alpha_1 = \rho_{UD} \frac{\sigma_D}{\sigma_U} We can estimate σD\sigma_D from the observed data set, and we can reasonably assume that σU=1\sigma_U = 1 (since we could always normalize the original measurement of UU). Finally, we can specify a range of ρUD\rho_{UD} (I am keeping everything positive here for simplicity), such that 0<ρUD<0.900 < \rho_{UD} < 0.90 (where I assume a correlation of 0.900.90 is at or beyond the realm of reasonableness). By plugging these three parameters into the formula, we can generate a range of α1\alpha_1’s.

Furthermore, we can derive an estimate of the variance for ϵD\epsilon_D ( σϵD2\sigma_{\epsilon_D}^2) at each level of ρUD\rho_{UD}:

Var(D)=Var(α0+α1U+ϵD)σD2=α12σU2+σϵD2σϵD2=σD2α12  (since σU2=1) \begin{aligned} Var(D) &= Var(\alpha_0 + \alpha_1 U + \epsilon_D) \\ \\ \sigma_D^2 &= \alpha_1^2 \sigma_U^2 + \sigma_{\epsilon_D}^2 \\ \\ \sigma_{\epsilon_D}^2 &= \sigma_D^2 - \alpha_1^2 \; \text{(since } \sigma_U^2=1) \end{aligned}

So, for each value of ρUD\rho_{UD} that we generated, there is a corresponding pair (α1,  σϵD2)(\alpha_1, \; \sigma_{\epsilon_D}^2).

Determine β2\beta_2

In the addendum I go through a bit of an elaborate derivation of β2\beta_2, the coefficient of UU in the alternative outcome model. Here is the bottom line:

β2=α11σϵD2σD2(k1β1) \beta_2 = \frac{\alpha_1}{1-\frac{\sigma_{\epsilon_D}^2}{\sigma_D^2}}\left( k_1 - \beta_1\right)

In the equation, we have σD2\sigma^2_D and k1k_1, which are both estimated from the observed data and the pair of derived parameters α1\alpha_1 and σϵD2\sigma_{\epsilon_D}^2 based on ρUD\rho_{UD}. β1\beta_1, the coefficient of DD in the outcome model is a free parameter, set externally. That is, we can choose to evaluate all β2\beta_2’s the are generated when β1=0\beta_1 = 0. More generally, we can set β1=pk1\beta_1 = pk_1, where 0p10 \le p \le 1. (We could go negative if we want, though I won’t do that here.) If p=1p=1 , β1=k1\beta_1 = k_1 and β2=0\beta_2 = 0; we end up with the original model with no confounding.

So, once we specify ρUD\rho_{UD} and pp, we get the corresponding triplet (α1,  σϵD2,  β2)(\alpha_1, \; \sigma_{\epsilon_D}^2, \; \beta_2).

Determine ρUY\rho_{UY}

In this last step, we can identify the correlation of UU and YY, ρUY\rho_{UY}, that is associated with all the observed, specified, and derived parameters up until this point. We start by writing the alternative outcome model, and then replace DD with the alternative exposure model, and do some algebraic manipulation to end up with a re-parameterized alternative outcome model that has a single predictor:

Y=β0+β1D+β2U+ϵY=β0+β1(α0+α1U+ϵD)+β2U+ϵY=β0+β1α0+β1α1U+β1ϵD+β2U+ϵY=β0+(β1α1+β2)U+ϵY+=β0+β1U+ϵY+, \begin{aligned} Y &= \beta_0 + \beta_1 D + \beta_2 U + \epsilon_Y^* \\ &= \beta_0 + \beta_1 \left( \alpha_0 + \alpha_1 U + \epsilon_D \right) + \beta_2 U + \epsilon_Y^* \\ &=\beta_0 + \beta_1 \alpha_0 + \beta_1 \alpha_1 U + \beta_1 \epsilon_D + \beta_2 U + \epsilon_Y^* \\ &=\beta_0^* + \left( \beta_1 \alpha_1 + \beta_2 \right)U + \epsilon_Y^+ \\ &=\beta_0^* + \beta_1^*U + \epsilon_Y^+ , \\ \end{aligned}

where β0=β0+β1α0\beta_0^* = \beta_0 + \beta_1 \alpha_0, β1=(β1α1+β2)\beta_1^* = \left( \beta_1 \alpha_1 + \beta_2 \right), and ϵY+=β1ϵD+ϵY\epsilon_Y^+ = \beta_1 \epsilon_D + \epsilon_Y*.

As before, the coefficient in a simple linear regression model with a single predictor is related to the correlation of the two variables as follows:

β1=ρUYσYσU \beta_1^* = \rho_{UY} \frac{\sigma_Y}{\sigma_U}

Since β1=(β1α1+β2)\beta_1^* = \left( \beta_1 \alpha_1 + \beta_2 \right),

β1α1+β2=ρUYσYσUρUY=σUσY(β1α1+β2)=(β1α1+β2)σY \begin{aligned} \beta_1 \alpha_1 + \beta_2 &= \rho_{UY} \frac{\sigma_Y}{\sigma_U} \\ \\ \rho_{UY} &= \frac{\sigma_U}{\sigma_Y} \left( \beta_1 \alpha_1 + \beta_2 \right) \\ \\ &= \frac{\left( \beta_1 \alpha_1 + \beta_2 \right)}{\sigma_Y} \end{aligned}

Determine σY2\sigma^2_{Y*}

In order to simulate data from the alternative DGPs, we need to derive the variation for the noise of the alternative model. That is, we need an estimate of σY2\sigma^2_{Y*}.

Var(Y)=Var(β0+β1D+β2U+ϵY)=β12Var(D)+β22Var(U)+2β1β2Cov(D,U)+Var(ϵy)=β12σD2+β22+2β1β2ρUDσD+σY2 \begin{aligned} Var(Y) &= Var(\beta_0 + \beta_1 D + \beta_2 U + \epsilon_{Y^*}) \\ \\ &= \beta_1^2 Var(D) + \beta_2^2 Var(U) + 2\beta_1\beta_2Cov(D, U) + Var(\epsilon_{y*}) \\ \\ &= \beta_1^2 \sigma^2_D + \beta_2^2 + 2\beta_1\beta_2\rho_{UD}\sigma_D + \sigma^2_{Y*} \\ \end{aligned}

So,

σY2=Var(Y)(β12σD2+β22+2β1β2ρUDσD), \sigma^2_{Y*} = Var(Y) - (\beta_1^2 \sigma^2_D + \beta_2^2 + 2\beta_1\beta_2\rho_{UD}\sigma_D),

where Var(Y)Var(Y) is the variation of YY from the observed data. Now we are ready to implement this in R.

Implementing in R

If we have an observed data set with observed DD and YY, and some target β1\beta_1 determined by pp, we can calculate/generate all the quantities that we just derived.

Before getting to the function, I want to make a brief point about what we do if we have other measured confounders. We can essentially eliminate measured confounders by regressing the exposure DD on the confounders and conducting the entire sensitivity analysis with the residual exposure measurements derived from this initial regression model. I won’t be doing this here, but if anyone wants to see an example of this, let me know, and I can do a short post.

OK - here is the function, which essentially follows the path of the derivation above:

altDGP <- function(dd, p) {
  
  # Create values of rhoUD
  
  dp <- data.table(p = p, rhoUD = seq(0.0, 0.9, length = 1000))
  
  # Parameters estimated from data
  
  dp[, `:=`(sdD = sd(dd$D), s2D = var(dd$D), sdY = sd(dd$Y))] 
  dp[, k1:= coef(lm(Y ~ D, data = dd))[2]]
  
  # Generate b1 based on p
  
  dp[, b1 := p * k1]
  
  # Determine a1
  
  dp[, a1 := rhoUD * sdD ]
  
  # Determine s2ed
  
  dp[, s2ed := s2D - (a1^2)]
  
  # Determine b2
  
  dp[, g:= s2ed/s2D]
  dp <- dp[g != 1]
  dp[, b2 := (a1 / (1 - g) ) * ( k1 - b1 )]
  
  # Determine rhoUY
  
  dp[, rhoUY := ( (b1 * a1) + b2 ) / sdY ]
  
  # Eliminate impossible correlations
  
  dp <- dp[rhoUY > 0 & rhoUY <= .9]
  
  # Determine s2eyx
  
  dp[, s2eyx := sdY^2 - (b1^2 * s2D + b2^2 + 2 * b1 * b2 * rhoUD * sdD)]
  dp <- dp[s2eyx > 0]
  
  # Determine standard deviations
  
  dp[, sdeyx := sqrt(s2eyx)]
  dp[, sdedx := sqrt(s2ed)]
  
  # Finished
  
  dp[]
  
}

Assessing sensitivity

If we generate the same data set we started out with last post, we can use the function to assess the sensitivity of this association.

defO <- defData(varname = "D", formula = 0, variance = 1)
defO <- defData(defO, varname = "Y", formula = "1.5 * D", variance = 25)

set.seed(20181201)
dtO <- genData(1200, defO)

In this first example, I am looking for the DGP with β1=0\beta_1 = 0, which is implemented as p=0p = 0 in the call to function altDGP. Each row of output represents an alternative set of parameters that will result in a DGP with β1=0\beta_1 = 0.

dp <- altDGP(dtO, p = 0)
dp[, .(rhoUD, rhoUY, k1, b1, a1, s2ed, b2, s2eyx)]
##      rhoUD rhoUY   k1 b1    a1  s2ed   b2 s2eyx
##   1: 0.295 0.898 1.41  0 0.294 0.904 4.74  5.36
##   2: 0.296 0.896 1.41  0 0.295 0.903 4.72  5.50
##   3: 0.297 0.893 1.41  0 0.296 0.903 4.71  5.63
##   4: 0.298 0.890 1.41  0 0.297 0.902 4.69  5.76
##   5: 0.299 0.888 1.41  0 0.298 0.902 4.68  5.90
##  ---                                           
## 668: 0.896 0.296 1.41  0 0.892 0.195 1.56 25.35
## 669: 0.897 0.296 1.41  0 0.893 0.193 1.56 25.35
## 670: 0.898 0.296 1.41  0 0.894 0.191 1.56 25.36
## 671: 0.899 0.295 1.41  0 0.895 0.190 1.56 25.36
## 672: 0.900 0.295 1.41  0 0.896 0.188 1.55 25.37

Now, I am creating a data set that will be based on four levels of β1\beta_1. I do this by creating a vector p=&ThickSpace;&lt;0.0,&ThickSpace;0.2,&ThickSpace;0.5,&ThickSpace;0.8&gt;p = \; &lt;0.0, \; 0.2, \; 0.5, \; 0.8&gt;. The idea is to create a plot that shows the curve for each value of pp. The most extreme curve (in this case, the curve all the way to the right, since we are dealing with positive associations only) represents the scenario where p=0p = 0 (i.e. β1=0\beta_1 = 0). The curves moving to the left reflect increasing sensitivity as pp increases.

dsenO <- rbindlist(lapply(c(0.0, 0.2, 0.5, 0.8), 
                     function(x) altDGP(dtO, x)))

I would say that in this first case the observed association is moderately sensitive to unmeasured confounding, as correlations as low as 0.5 would enough to erase the association.

In the next case, if the association remains unchanged but the variation of YY is considerably reduced, the observed association is much less sensitive. However, it is still quite possible that the observed overestimation is at least partially overstated, as relatively low levels of correlation could reduce the estimated association.

defA1 <- updateDef(defO, changevar = "Y", newvariance = 4)

In this last scenario, variance is the same as the initial scenario, but the association is considerably weaker. Here, we see that the estimate of the association is extremely sensitive to unmeasured confounding, as low levels of correlation are required to entirely erase the association.

defA2 <- updateDef(defO, changevar = "Y", newformula = "0.25 * D")

treatSens package

I want to show output generated by the treatSens package I referenced earlier. treatSens requires a formula that includes an outcome vector YY, an exposure vector ZZ, and at least one vector of measured of confounders XX. In my examples, I have included no measured confounders, so I generate a vector of independent noise that is not related to the outcome.

library(treatSens)

X <- rnorm(1200)
Y <- dtO$Y
Z <- dtO$D

testsens <- treatSens(Y ~ Z + X, nsim = 5)
sensPlot(testsens)

Once treatSens has been executed, it is possible to generate a sensitivity plot, which looks substantively similar to the ones I have created. The package uses sensitivity parameters ζZ\zeta^Z and ζY\zeta^Y, which represent the coefficients of UU, the unmeasured confounder. Since treatSens normalizes the data (in the default setting), these coefficients are actually equivalent to the correlations ρUD\rho_{UD} and ρUY\rho_{UY} that are the basis of my sensitivity analysis. A important difference in the output is that treatSens provides uncertainty bands, and extends into regions of negative correlation. (And of course, a more significant difference is that treatSens is flexible enough to handle binary exposures, whereas I have not yet extended my analytic approach in that direction, and I suspect it is no possible for me to do so due to non-collapsibility of logistic regression estimands - I hope to revisit this in the future.)

Observed data scenario 1: YN(1.50Z,&ThickSpace;25)\small{Y \sim N(1.50Z, \; 25)}

Observed data scenario 2: YN(1.50Z,&ThickSpace;4)\small{Y \sim N(1.50Z, \; 4)}

Observed data scenario 3: YN(0.25Z,&ThickSpace;25)\small{Y \sim N(0.25Z, \; 25)}

Addendum: Derivation of β2\beta_2

In case you want more detail on how we derive β2\beta_2 from the observed data model and assumed correlation parameters, here it is. We start by specifying the simple observed outcome model:

Y=k0+k1D+ϵY Y = k_0 + k_1D + \epsilon_Y

We can estimate the parameters k0k_0 and k1k_1 using this standard matrix solution:

&lt;k0,&ThickSpace;k1&gt;&ThickSpace;=(WTW)1WTY, &lt;k_0, \; k_1&gt; \; = (W^TW)^{-1}W^TY,

where WW is the n×2n \times 2 design matrix:

W=[1,D]n×2. W = [\mathbf{1}, D]_{n \times 2}.

We can replace YY with the alternative outcome model:

&lt;k0,&ThickSpace;k1&gt;&ThickSpace;=(WTW)1WT(β0+β1D+β2U+ϵY)=&ThickSpace;&lt;β0,0&gt;+&lt;0,β1&gt;+&ThickSpace;(WTW)1WT(β2U)+0=&ThickSpace;&lt;β0,β1&gt;+&ThickSpace;(WTW)1WT(β2U) \begin{aligned} &lt;k_0, \; k_1&gt; \; &amp;= (W^TW)^{-1}W^T(\beta_0 + \beta_1 D + \beta_2 U + \epsilon_Y^*) \\ &amp;= \;&lt;\beta_0, 0&gt; + &lt;0, \beta_1&gt; +\; (W^TW)^{-1}W^T(\beta_2U) + \mathbf{0} \\ &amp;= \;&lt;\beta_0, \beta_1&gt; +\; (W^TW)^{-1}W^T(\beta_2U) \end{aligned}

Note that

(WTW)1WT(β0)=&ThickSpace;&lt;β0,&ThickSpace;0&gt;&ThickSpace;&ThickSpace;and(WTW)1WT(β1D)=&ThickSpace;&lt;0,&ThickSpace;β1&gt;. \begin{aligned} (W^TW)^{-1}W^T(\beta_0) &amp;= \; &lt;\beta_0,\; 0&gt; \; \; and\\ \\ (W^TW)^{-1}W^T(\beta_1D) &amp;= \; &lt;0,\; \beta_1&gt;. \end{aligned}

Now, we need to figure out what (WTW)1WT(β2U)(W^TW)^{-1}W^T(\beta_2U) is. First, we rearrange the alternate exposure model: D=α0+α1U+ϵDα1U=Dα0ϵDU=1α1(Dα0ϵD)β2U=β2α1(Dα0ϵD) \begin{aligned} D &amp;= \alpha_0 + \alpha_1 U + \epsilon_D \\ \alpha_1 U &amp;= D - \alpha_0 - \epsilon_D \\ U &amp;= \frac{1}{\alpha_1} \left( D - \alpha_0 - \epsilon_D \right) \\ \beta_2 U &amp;= \frac{\beta_2}{\alpha_1} \left( D - \alpha_0 - \epsilon_D \right) \end{aligned}

We can replace β2U\beta_2 U:

(WTW)1WT(β2U)=(WTW)1WT[β2α1(Dα0ϵD)]=&lt;β2α1α0,0&gt;+&lt;0,β2α1&gt;&ThickSpace;β2α1(WTW)1WTϵD=&lt;β2α1α0,β2α1&gt;&ThickSpace;β2α1(WTW)1WTϵD \begin{aligned} (W^TW)^{-1}W^T(\beta_2U) &amp;= (W^TW)^{-1}W^T \left[ \frac{\beta_2}{\alpha_1} \left( D - \alpha_0 - \epsilon_D \right) \right] \\ &amp;= &lt;-\frac{\beta_2}{\alpha_1}\alpha_0, 0&gt; + &lt;0,\frac{\beta_2}{\alpha_1}&gt;-\;\frac{\beta_2}{\alpha_1}(W^TW)^{-1}W^T \epsilon_D \\ &amp;= &lt;-\frac{\beta_2}{\alpha_1}\alpha_0, \frac{\beta_2}{\alpha_1}&gt;-\;\frac{\beta_2}{\alpha_1}(W^TW)^{-1}W^T \epsilon_D \\ \end{aligned}

And now we get back to &lt;k0,&ThickSpace;k1&gt;&lt;k_0,\; k_1&gt; :

&lt;k0,&ThickSpace;k1&gt;&ThickSpace;=&ThickSpace;&lt;β0,&ThickSpace;β1&gt;+&ThickSpace;(WTW)1WT(β2U)=&ThickSpace;&lt;β0β2α1α0,&ThickSpace;β1+β2α1&gt;&ThickSpace;β2α1(WTW)1WTϵD=&ThickSpace;&lt;β0β2α1α0,&ThickSpace;β1+β2α1&gt;&ThickSpace;β2α1&lt;γ0,&ThickSpace;γ1&gt; \begin{aligned} &lt;k_0,\; k_1&gt; \; &amp;= \;&lt;\beta_0,\; \beta_1&gt; +\; (W^TW)^{-1}W^T(\beta_2U) \\ &amp;= \;&lt;\beta_0-\frac{\beta_2}{\alpha_1}\alpha_0, \; \beta_1 + \frac{\beta_2}{\alpha_1}&gt;-\;\frac{\beta_2}{\alpha_1}(W^TW)^{-1}W^T \epsilon_D \\ &amp;= \;&lt;\beta_0-\frac{\beta_2}{\alpha_1}\alpha_0, \; \beta_1 + \frac{\beta_2}{\alpha_1}&gt;-\;\frac{\beta_2}{\alpha_1}&lt;\gamma_0, \; \gamma_1&gt; \end{aligned}

where γ0\gamma_0 and γ1\gamma_1 come from regressing ϵD\epsilon_D on DD:

ϵD=γ0+γ1D \epsilon_D = \gamma_0 + \gamma_1 D so,

&lt;k0,&ThickSpace;k1&gt;&ThickSpace;=&ThickSpace;&lt;β0β2α1α0β2α1γ0,&ThickSpace;β1+β2α1β2α1γ1&gt;=&ThickSpace;&lt;β0β2α1(α0+γ0),&ThickSpace;β1+β2α1(1γ1)&gt; \begin{aligned} &lt;k_0,\; k_1&gt; \; &amp;= \;&lt;\beta_0-\frac{\beta_2}{\alpha_1}\alpha_0 - \frac{\beta_2}{\alpha_1}\gamma_0, \; \beta_1 + \frac{\beta_2}{\alpha_1} - \frac{\beta_2}{\alpha_1}\gamma_1 &gt; \\ &amp;= \;&lt;\beta_0-\frac{\beta_2}{\alpha_1}\left(\alpha_0 + \gamma_0\right), \; \beta_1 + \frac{\beta_2}{\alpha_1}\left(1 - \gamma_1 \right) &gt; \end{aligned}

Since we can center all the observed data, we can easily assume that k0=0k_0 = 0. All we need to worry about is k1k_1:

k1=β1+β2α1(1γ1)β2α1(1γ1)=k1β1β2=α11γ1(k1β1) \begin{aligned} k_1 &amp;= \beta_1 + \frac{\beta_2}{\alpha_1}\left(1 - \gamma_1 \right) \\ \frac{\beta_2}{\alpha_1}\left(1 - \gamma_1 \right) &amp;= k_1 - \beta_1 \\ \beta_2 &amp;= \frac{\alpha_1}{1-\gamma_1}\left( k_1 - \beta_1\right) \end{aligned}

We have generated α1\alpha_1 based on ρUD\rho_{UD}, k1k_1 is a estimated from the data, and β1\beta_1 is fixed based on some p,&ThickSpace;0p1p, \; 0 \le p \le 1 such that β1=pk1\beta_1 = pk_1. All that remains is γ1\gamma_1:

γ1=ρϵDDσϵDσD \gamma_1 = \rho_{\epsilon_D D} \frac{\sigma_{\epsilon_D}}{\sigma_D}

Since D=α0+α1U+ϵDD = \alpha_0 + \alpha_1 U + \epsilon_D (and ϵD&NegativeThinSpace;&NegativeThinSpace;&NegativeThinSpace;U\epsilon_D \perp \! \! \! \perp U)

ρϵDD=Cov(ϵD,D)σϵDσD=Cov(ϵD,&ThickSpace;α0+α1U+ϵD)σϵDσD=σϵD2σϵDσD=σϵDσD \begin{aligned} \rho_{\epsilon_D D} &amp;= \frac{Cov(\epsilon_D, D)}{\sigma_{\epsilon_D} \sigma_D} \\ \\ &amp;=\frac{Cov(\epsilon_D, \;\alpha_0 + \alpha_1 U + \epsilon_D )}{\sigma_{\epsilon_D} \sigma_D} \\ \\ &amp;= \frac{\sigma_{\epsilon_D}^2}{\sigma_{\epsilon_D} \sigma_D} \\ \\ &amp;= \frac{\sigma_{\epsilon_D}}{\sigma_D} \end{aligned}

It follows that

γ1=ρϵDDσϵDσD=σϵDσD×σϵDσD=σϵD2σD2 \begin{aligned} \gamma_1 &amp;= \rho_{\epsilon_D D} \frac{\sigma_{\epsilon_D}}{\sigma_D} \\ \\ &amp;=\frac{\sigma_{\epsilon_D}}{\sigma_D} \times \frac{\sigma_{\epsilon_D}}{\sigma_D} \\ \\ &amp;=\frac{\sigma_{\epsilon_D}^2}{\sigma_D^2} \end{aligned}

So, now, we have all the elements to generate β2\beta_2 for a range of α1\alpha_1’s and σϵD2\sigma_{\epsilon_D}^2’s:

β2=α11σϵD2σD2(k1β1) \beta_2 = \frac{\alpha_1}{1-\frac{\sigma_{\epsilon_D}^2}{\sigma_D^2}}\left( k_1 - \beta_1\right)

R 
comments powered by Disqus