1 Causal Inference and Potential Outcomes

This week we will introduce the topic of causal inference. We will outline a specific definition of causality using the potential outcomes framework, and will describe the fundamental problem of causal inference. We will highlight the persistent threat of selection bias in observational data and we will discuss differences between statistical inference and causal inference.

There are excellent introductions to the potential outcomes framework in Angrist and Pischke, 2014, (Introduction) and in Gerber and Green, 2012 (Chapter 1). The Paul Holland article on Statistics and Causal Inference provides an excellent history of the framework, and relates the conception of causality we focus on to long-standing philosophical discussions. If you are looking for inspiration, and a fun read, then the first half of David Freedman’s article on Statistical Models and Shoe Leather (p. 291-300) provides an interesting history of John Snow’s (no, not the guy from Game of Thrones) study of cholera in 19th century London. The second half of that article is also worth reading, as a fairly strong argument for why “statistical technique can seldom be an adequate substitute for good design, relevant data, and testing predictions against reality in a variety of settings.”


1.1 Seminar

For this week’s applied material we will review some key concepts of the “potential outcomes” framework and review some basics of using R. If you did not take PUBL0055 last term, or if are struggling to remember R from all the way back in December, then you should work through the exercises on the R refresher page before completing this assignment.

1.1.1 Potential Outcomes review

The following questions are designed to help you get familiar with the potential outcomes framework for causal inference that we discussed in the lecture.

  1. Explain the notation \(Y_{0i}\).

The potential outcome for subject \(i\) if this subject were untreated. Put another way: the untreated potential outcome for subject i.

  1. Explain the notation \(Y_{1i}\).

The potential outcome for subject \(i\) if this subject were treated. Put another way: the treated potential outcome for subject i.

  1. Contrast the meaning of \(Y_{0i}\) with the meaning of \(Y_i\).

The first is the potential outcome for subject \(i\) if this subject were untreated. The second is simply the observed outcome for subject \(i\).

  1. Can we observe both \(Y_{0i}\) and \(Y_{1i}\) for any individual unit at the same time?

No, recall that:

  • \(Y_{0i}\) = the potential outcome for \(i\) under control.
  • \(Y_{1i}\) = the potential outcome for \(i\) under treatment.

Only one of the two potential outcomes for \(i\) can ever be realized, as a subject cannot be under control and treatment simultaneously. Consequently, observing both potential outcomes is not possible. This is known as the “fundamental problem of causal inference”.

  1. If \(D_i\) is a binary variable that gives the treatment status for subject \(i\) (1 if treated, 0 if control), what is the meaning of \(E[Y_{0i}|D_i = 1]\)?

The expected value of the potential outcome for subject i if the subject were untreated, given that this subject actually receives treatment. Put another way: the expected value of the untreated potential outcome for a subject in the treatment group.

  1. The table below contains the potential outcomes (\(Y_{1i}\) and \(Y_{0i}\)) and the treatment indicator (\(D_i\)) from a hypothetical experiment with 6 units. Complete the following calculations by hand.
    1. List the observed outcomes (\(Y_i\)) for the experiment based on the table above.
    2. Calculate the “true” average treatment effect (ATE) based on the potential outcomes.
    3. Calculate the “true” average treatment effect on the treated (ATT) based on the potential outcomes.
    4. Calculate the “estimated” average treatment effect based on the naive difference in group means for treatment and control conditions from the observed outcomes. Explain the difference between this estimate and the “true” average treatment effect.
Unit \(Y_{1i}\) \(Y_{0i}\) \(D_i\)
1 2 2 1
2 3 -1 1
3 -1 9 1
4 17 8 0
5 12 9 0
6 9 1 0
  1. 2, 3, -1, 8, 9, 1
    Note that in a real experiment, these are the only values (along with the treatment assignment) we would observe. The other potential outcomes (i.e. \(Y_{1i}\) for observations with \(D_i = 0\) and \(Y_{0i}\) for observations with \(D_i = 1\) are unobservable.)

  2. \(\tau_\text{ATE} = \frac{0 + 4 - 10 + 9 + 3 + 8}{6} = \frac{14}{6} = 2.33\)

  3. \(\tau_\text{ATT} = \frac{0 + 4 - 10}{3} = \frac{-6}{3} = -2\)

  4. \(\hat{\tau_\text{ATE}} = E[Y_i|D_i = 1] - E[Y_i|D_i=0] = \frac{2+3-1}{3} - \frac{8+9+1}{3} = \frac{4}{3} - \frac{18}{3} = -4.67\)
    Recall that \(E[DIGM] = E[\tau_i|D_i=1] + E[Y_{0i}|D_i = 1] - E[Y_{0i}|D_i=0]\), meaning that the difference in group means is an unbiased estimator of the ATE only when a) the ATE is equal to the ATT, and b) there is no selection bias.
    In this case neither are true, and so this estimated ATE is very different from the “true” ATE:

    • \(\tau_\text{ATT} = \frac{0 + 4 + -10}{3} = -2\)
    • \(\text{Selection bias} = \frac{2 -1 +9}{3} - \frac{8 +9 +1}{3} = -2.667\)

1.1.2 Islam and Authoritarianism

In a famous paper titled “Islam and Authoritarianism”, Steven Fish asks whether Muslim societies are less democratic.1 To find out, he runs a series of cross-sectional regressions of countries’ Freedom House scores (an indicator of the level of a country’s democracy) on characteristics of the countries, including whether they are predominantly Muslim.

The paper’s dataset is in the spreadsheet fishdata.csv, which you can download using the button at the top of the page. You should load the data using the read.csv() function, as follows:

fish <- read.csv("data/fishdata.csv")

This data contains the following variables (among others):

  • FHREVERS - Freedom House scores, a measure of democracy where higher values indicate that a country is more democratic and lower values indicate greater authoritarianism
  • MUSLIM - 1 if a country is predominantly Muslim, 0 otherwise
  • GDP90LGN - the country’s GDP in 1990
  • GRW7598P - the country’s average annual economic growth from 1975-98, in percent
  • BRITCOL - 1 if the country was a British colony, 0 otherwise
  • OPEC - 1 if the country is a member of the OPEC group of oil-exporting countries, 0 otherwise

We can look at the first 6 rows of this data using the head() function:

head(fish)
##   COUNTRY FHREVERS GDP90LGN ETHLING GRW7598P BRITCOL POSTCOM OPEC MUSLIM
## 1     Alb     4.10 2.925312    0.26     -0.8       0       1    0      1
## 2     Alg     2.15 3.214314    0.31      0.2       0       0    1      1
## 3     Arg     5.65 3.762078    0.21      0.6       0       0    0      0
## 4     Arm     3.95 3.187803    0.16     -6.6       0       1    0      0
## 5 Austria     7.00 4.435542    0.14      2.2       0       0    0      0
## 6  Austrl     7.00 4.255827    0.13      1.9       1       0    0      0
  1. Taking subsets and summarising variables
    1. How many countries are predominantly Muslim?
    2. What percentage of countries are predominantly Muslim?
    3. How many countries have GDP in 1990 of above 3.0?
    4. How many countries are both Muslim and a former British colony?
    5. How many countries have either average economic growth from 1975-98 of above 0.6% or GDP in 1990 of above 2.5?
    6. Create a new dataset consisting only of countries that are both Muslim and a member of OPEC

Code Hint: Use square brackets to denote subsets of a variable or dataset. You’ll also need the length() function.

sum(fish$MUSLIM)  # a
## [1] 44
sum(fish$MUSLIM)/length(fish$MUSLIM) # b
## [1] 0.2972973
length(fish$GDP90LGN[fish$GDP90LGN>3]) # c
## [1] 88
length(fish$MUSLIM[fish$MUSLIM==1 & fish$BRITCOL==1]) # d
## [1] 7
length(fish$GDP90LGN[fish$GDP90LGN>2.5 | fish$GRW7598P>0.6]) # e
## [1] 134
fish.new <- fish[fish$MUSLIM==1 & fish$OPEC==1,]  # f
  1. What is the difference in mean Freedom House score between Muslim and Non-Muslim countries? Calculate it by hand.
mean(fish$FHREVERS[fish$MUSLIM==1]) - mean(fish$FHREVERS[fish$MUSLIM==0])
## [1] -2.198584

On average, muslim countries score 2.2 less than non-muslim countries.

  1. Is the difference in means calculated above likely to be biased? If so, in which direction and why?

This is only a bivariate relationship, without any controls. In reality Muslim countries may be different than non-Muslim countries in many other ways that also affect their level of democracy, e.g. their level of economic development. This suggests that the measure is likely to be biased.

Some obvious omitted variables, such as economic development, are likely to be positively correlated with a country’s democracy level but negatively correlated with being predominantly Muslim. The difference in means is biased downward, because we have not accounted for the fact that predominantly Muslim countries are also poorer. Other potential omitted variables are likely to be negatively correlated with the democracy level but positively correlated with being predominantly Muslim, including OPEC membership (e.g., the ‘resource curse’ theory in political science suggests that access to oil revenues acts as a ‘curse’, allowing governments to buy off citizens without introducing democracy). Again, this suggests downward bias from having failed to account for the fact that predominantly Muslim socieities are also more likely to be OPEC members.

  1. Conduct a t-test for the difference in means calculated above using the t.test() function. Is the difference statistically significant?
t.test(fish$FHREVERS[fish$MUSLIM==1], fish$FHREVERS[fish$MUSLIM==0])
## 
##  Welch Two Sample t-test
## 
## data:  fish$FHREVERS[fish$MUSLIM == 1] and fish$FHREVERS[fish$MUSLIM == 0]
## t = -9.6267, df = 128.17, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.650477 -1.746691
## sample estimates:
## mean of x mean of y 
##  2.690455  4.889038

The difference is statistically significant at any significance level, because the t-statistic is -9.63 and the p-value is extremely close to zero.

  1. Conduct the t-test again, this time coding it by hand. Confirm that your answer is identical to the answer you calculated in the question above.
  d <- mean(fish$FHREVERS[fish$MUSLIM==1]) - mean(fish$FHREVERS[fish$MUSLIM==0])
  se <-   sqrt( 
              var(fish$FHREVERS[fish$MUSLIM==1])/length(fish$FHREVERS[fish$MUSLIM==1]) +
              var(fish$FHREVERS[fish$MUSLIM==0])/length(fish$FHREVERS[fish$MUSLIM==0]) 
              )
  d/se
## [1] -9.626655

The results are identical.

  1. Estimate a linear regression with FHREVERS as the dependent variable and MUSLIM as the independent variable. How do the results from your regression relate to the difference-in-means that you calculated in question 2?
mean(fish$FHREVERS[fish$MUSLIM==1]) - mean(fish$FHREVERS[fish$MUSLIM==0])
## [1] -2.198584
mod <- lm(FHREVERS ~ MUSLIM, data = fish)
coef(mod)[[2]]
## [1] -2.198584

The results are identical, of course. On average, muslim countries score 2.2 less than non-muslim countries. Remember that linear regression will give the same results as the difference in means whenever the only predictor in the model is a binary variable.


1.2 Quiz

  1. Select the causal question among the following:
  1. Are the United States a democracy?
  2. Does free trade make a country democratic?
  3. Are the United States as democratic as Sweden?
  4. Will China become a democracy?
  1. What does “the fundamental problem of causal inference” imply?
  1. That to talk about causality we do not need counterfactuals
  2. That the treatment does not affect the outcome variable
  3. That we can never observe the full schedule of potential outcomes
  4. That untreated and treated potential outcomes do not both exist
  1. What does the “no interference assumption” (or SUTVA) state?
  1. That assigning unit \(i\) to treatment does not affect unit \(j\)’s potential outcomes
  2. That unit \(i\) is not treated unless unit \(j\) is treated
  3. That treated and untreated potential outcomes for unit \(i\) are identical
  4. That assigning unit \(i\) to treatment does not determine its observed outcome
  1. Is the difference in group means (DIGM) an unbiased estimator of the average treatment effect of the treated (ATT)?
  1. Yes, always
  2. Yes, if selection bias equals 0
  3. No, never
  4. No, unless treated potential outcomes for treated and untreated units are equal, in expectation
  1. What is causal inference?
  1. Inference from observed data to the data we would have measured if we had access to a broader population of units
  2. Inference from observed data to unmeasured quantities describing the same units
  3. Inference from two propositions assumed to be true to the truth of the conclusion
  4. Inference from observed data about the data we would have observed for the same units given counterfactual circumstances

  1. M. Steven Fish (2002). “Islam and Authoritarianism.” World Politics, 55 (1): 4-37↩︎