# 1 Causal Inference and Potential Outcomes

This week we will introduce the topic of causal inference. We will outline a specific definition of causality using the potential outcomes framework, and will describe the fundamental problem of causal inference. We will highlight the persistent threat of selection bias in observational data and we will discuss differences between statistical inference and causal inference.

There are excellent introductions to the potential outcomes framework in Angrist and Pischke, 2014, (*Introduction*) and in Gerber and Green, 2012 (Chapter 1). The Paul Holland article on Statistics and Causal Inference provides an excellent history of the framework, and relates the conception of causality we focus on to long-standing philosophical discussions. If you are looking for inspiration, and a fun read, then the first half of David Freedman’s article on Statistical Models and Shoe Leather (p. 291-300) provides an interesting history of John Snow’s (no, not the guy from Game of Thrones) study of cholera in 19th century London. The second half of that article is also worth reading, as a fairly strong argument for why “statistical technique can seldom be an adequate substitute for good design, relevant data, and testing predictions against reality in a variety of settings.”

## 1.1 Seminar

For this week’s applied material we will review some key concepts of the “potential outcomes” framework and review some basics of using R. If you did not take PUBL0055 last term, or if are struggling to remember R from all the way back in December, then you should work through the exercises on the R refresher page before completing this assignment.

### 1.1.1 Potential Outcomes review

The following questions are designed to help you get familiar with the potential outcomes framework for causal inference that we discussed in the lecture.

- Explain the notation \(Y_{0i}\).

The potential outcome for subject \(i\) if this subject were *untreated*. Put another way: the untreated potential outcome for subject i.

- Explain the notation \(Y_{1i}\).

The potential outcome for subject \(i\) if this subject were *treated*. Put another way: the treated potential outcome for subject i.

- Contrast the meaning of \(Y_{0i}\) with the meaning of \(Y_i\).

The first is the potential outcome for subject \(i\) if this subject were untreated. The second is simply the observed outcome for subject \(i\).

- Can we observe both \(Y_{0i}\) and \(Y_{1i}\) for any individual unit at the same time?

No, recall that:

- \(Y_{0i}\) = the potential outcome for \(i\) under control.
- \(Y_{1i}\) = the potential outcome for \(i\) under treatment.

Only one of the two potential outcomes for \(i\) can ever be realized, as a subject cannot be under control and treatment simultaneously. Consequently, observing both potential outcomes is not possible. This is known as the “fundamental problem of causal inference”.

- If \(D_i\) is a binary variable that gives the treatment status for subject \(i\) (1 if treated, 0 if control), what is the meaning of \(E[Y_{0i}|D_i = 1]\)?

The expected value of the potential outcome for subject i if the subject *were* untreated, given that this subject *actually* receives treatment. Put another way: the expected value of the untreated potential outcome for a subject in the treatment group.

- The table below contains the potential outcomes (\(Y_{1i}\) and \(Y_{0i}\)) and the treatment indicator (\(D_i\)) from a hypothetical experiment with 6 units. Complete the following calculations by hand.

- List the observed outcomes (\(Y_i\)) for the experiment based on the table above.
- Calculate the “true” average treatment effect (ATE) based on the potential outcomes.
- Calculate the “true” average treatment effect on the treated (ATT) based on the potential outcomes.
- Calculate the “estimated” average treatment effect based on the naive difference in group means for treatment and control conditions from the observed outcomes. Explain the difference between this estimate and the “true” average treatment effect.

Unit | \(Y_{1i}\) | \(Y_{0i}\) | \(D_i\) |
---|---|---|---|

1 | 2 | 2 | 1 |

2 | 3 | -1 | 1 |

3 | -1 | 9 | 1 |

4 | 17 | 8 | 0 |

5 | 12 | 9 | 0 |

6 | 9 | 1 | 0 |

2, 3, -1, 8, 9, 1

Note that in a real experiment, these are the only values (along with the treatment assignment) we would observe. The other potential outcomes (i.e. \(Y_{1i}\) for observations with \(D_i = 0\) and \(Y_{0i}\) for observations with \(D_i = 1\) are unobservable.)\(\tau_\text{ATE} = \frac{0 + 4 - 10 + 9 + 3 + 8}{6} = \frac{14}{6} = 2.33\)

\(\tau_\text{ATT} = \frac{0 + 4 - 10}{3} = \frac{-6}{3} = -2\)

\(\hat{\tau_\text{ATE}} = E[Y_i|D_i = 1] - E[Y_i|D_i=0] = \frac{2+3-1}{3} - \frac{8+9+1}{3} = \frac{4}{3} - \frac{18}{3} = -4.67\)

Recall that \(E[DIGM] = E[\tau_i|D_i=1] + E[Y_{0i}|D_i = 1] - E[Y_{0i}|D_i=0]\), meaning that the difference in group means is an unbiased estimator of the ATE only when a) the ATE is equal to the ATT, and b) there is no selection bias.

In this case neither are true, and so this estimated ATE is very different from the “true” ATE:- \(\tau_\text{ATT} = \frac{0 + 4 + -10}{3} = -2\)
- \(\text{Selection bias} = \frac{2 -1 +9}{3} - \frac{8 +9 +1}{3} = -2.667\)

## 1.2 Quiz

- Select the causal question among the following:

- Are the United States a democracy?
**Does free trade make a country democratic?**- Are the United States as democratic as Sweden?
- Will China become a democracy?

- What does “the fundamental problem of causal inference” imply?

- That to talk about causality we do not need counterfactuals
- That the treatment does not affect the outcome variable
**That we can never observe the full schedule of potential outcomes**- That untreated and treated potential outcomes do not both exist

- What does the “no interference assumption” (or SUTVA) state?

**That assigning unit \(i\) to treatment does not affect unit \(j\)’s potential outcomes**- That unit \(i\) is not treated unless unit \(j\) is treated
- That treated and untreated potential outcomes for unit \(i\) are identical
- That assigning unit \(i\) to treatment does not determine its observed outcome

- Is the difference in group means (DIGM) an unbiased estimator of the average treatment effect of the treated (ATT)?

- Yes, always
**Yes, if selection bias equals 0**- No, never
- No, unless treated potential outcomes for treated and untreated units are equal, in expectation

- What is causal inference?

- Inference from observed data to the data we would have measured if we had access to a broader population of units
- Inference from observed data to unmeasured quantities describing the same units
- Inference from two propositions assumed to be true to the truth of the conclusion
**Inference from observed data about the data we would have observed for the same units given counterfactual circumstances**

M. Steven Fish (2002). “Islam and Authoritarianism.”

*World Politics*, 55 (1): 4-37↩︎