1 Causal Inference and Potential Outcomes
This week we will introduce the topic of causal inference. We will outline a specific definition of causality using the potential outcomes framework, and will describe the fundamental problem of causal inference. We will highlight the persistent threat of selection bias in observational data and we will discuss differences between statistical inference and causal inference.
There are excellent introductions to the potential outcomes framework in Angrist and Pischke, 2014, (Introduction) and in Gerber and Green, 2012 (Chapter 1). The Paul Holland article on Statistics and Causal Inference provides an excellent history of the framework, and relates the conception of causality we focus on to long-standing philosophical discussions. If you are looking for inspiration, and a fun read, then the first half of David Freedman’s article on Statistical Models and Shoe Leather (p. 291-300) provides an interesting history of John Snow’s (no, not the guy from Game of Thrones) study of cholera in 19th century London. The second half of that article is also worth reading, as a fairly strong argument for why “statistical technique can seldom be an adequate substitute for good design, relevant data, and testing predictions against reality in a variety of settings.”
1.1 Seminar
For this week’s applied material we will review some key concepts of the “potential outcomes” framework and review some basics of using R. If you did not take PUBL0055 last term, or if are struggling to remember R from all the way back in December, then you should work through the exercises on the R refresher page before completing this assignment.
1.1.1 Potential Outcomes review
The following questions are designed to help you get familiar with the potential outcomes framework for causal inference that we discussed in the lecture.
- Explain the notation \(Y_{0i}\).
The potential outcome for subject \(i\) if this subject were untreated. Put another way: the untreated potential outcome for subject i.
- Explain the notation \(Y_{1i}\).
The potential outcome for subject \(i\) if this subject were treated. Put another way: the treated potential outcome for subject i.
- Contrast the meaning of \(Y_{0i}\) with the meaning of \(Y_i\).
The first is the potential outcome for subject \(i\) if this subject were untreated. The second is simply the observed outcome for subject \(i\).
- Can we observe both \(Y_{0i}\) and \(Y_{1i}\) for any individual unit at the same time?
No, recall that:
- \(Y_{0i}\) = the potential outcome for \(i\) under control.
- \(Y_{1i}\) = the potential outcome for \(i\) under treatment.
Only one of the two potential outcomes for \(i\) can ever be realized, as a subject cannot be under control and treatment simultaneously. Consequently, observing both potential outcomes is not possible. This is known as the “fundamental problem of causal inference”.
- If \(D_i\) is a binary variable that gives the treatment status for subject \(i\) (1 if treated, 0 if control), what is the meaning of \(E[Y_{0i}|D_i = 1]\)?
The expected value of the potential outcome for subject i if the subject were untreated, given that this subject actually receives treatment. Put another way: the expected value of the untreated potential outcome for a subject in the treatment group.
- The table below contains the potential outcomes (\(Y_{1i}\) and \(Y_{0i}\)) and the treatment indicator (\(D_i\)) from a hypothetical experiment with 6 units. Complete the following calculations by hand.
- List the observed outcomes (\(Y_i\)) for the experiment based on the table above.
- Calculate the “true” average treatment effect (ATE) based on the potential outcomes.
- Calculate the “true” average treatment effect on the treated (ATT) based on the potential outcomes.
- Calculate the “estimated” average treatment effect based on the naive difference in group means for treatment and control conditions from the observed outcomes. Explain the difference between this estimate and the “true” average treatment effect.
Unit | \(Y_{1i}\) | \(Y_{0i}\) | \(D_i\) |
---|---|---|---|
1 | 2 | 2 | 1 |
2 | 3 | -1 | 1 |
3 | -1 | 9 | 1 |
4 | 17 | 8 | 0 |
5 | 12 | 9 | 0 |
6 | 9 | 1 | 0 |
2, 3, -1, 8, 9, 1
Note that in a real experiment, these are the only values (along with the treatment assignment) we would observe. The other potential outcomes (i.e. \(Y_{1i}\) for observations with \(D_i = 0\) and \(Y_{0i}\) for observations with \(D_i = 1\) are unobservable.)\(\tau_\text{ATE} = \frac{0 + 4 - 10 + 9 + 3 + 8}{6} = \frac{14}{6} = 2.33\)
\(\tau_\text{ATT} = \frac{0 + 4 - 10}{3} = \frac{-6}{3} = -2\)
\(\hat{\tau_\text{ATE}} = E[Y_i|D_i = 1] - E[Y_i|D_i=0] = \frac{2+3-1}{3} - \frac{8+9+1}{3} = \frac{4}{3} - \frac{18}{3} = -4.67\)
Recall that \(E[DIGM] = E[\tau_i|D_i=1] + E[Y_{0i}|D_i = 1] - E[Y_{0i}|D_i=0]\), meaning that the difference in group means is an unbiased estimator of the ATE only when a) the ATE is equal to the ATT, and b) there is no selection bias.
In this case neither are true, and so this estimated ATE is very different from the “true” ATE:- \(\tau_\text{ATT} = \frac{0 + 4 + -10}{3} = -2\)
- \(\text{Selection bias} = \frac{2 -1 +9}{3} - \frac{8 +9 +1}{3} = -2.667\)
M. Steven Fish (2002). “Islam and Authoritarianism.” World Politics, 55 (1): 4-37↩︎