7 Instrumental Variables (I)


7.1 Lecture review and reading

In lecture this week we mostly focussed on IV as a strategy for dealing with non-compliance (one-sided and two-sided) in randomized experiments. The clearest exposition of the idea of non-compliance is given in the two chapters of the Gerber and Green textbook (chapter 5 and chapter 6), although note that they generally avoid any mention of the phrase “instrumental variable” (I don’t really know why. Some of the linguistic decisions in that book are a little idiosyncratic.). The discussion of IV estimators is also very good (and short) in the Sovey and Green paper. This paper also emphasises the point I made in the lecture: most applications in political science do not use IV for addressing non-compliance in randomized experiments (the case we discussed this week), but instead are applications where researchers use IV as a method for overcoming selection bias in cross-sectional observational studies (the case we will discuss next week). I particularly liked the “reader’s checklist” they provide at the end of the paper, which gives good, straightforward advice to anyone thinking about using IV as an estimation strategy for causal effects.

The chapter on IV estimation in MHE is good, though much of the material is beyond the level you (or anyone else, to be honest) will need to implement a good IV design. The treatment in the Mastering ’Metrics book is somewhat easier, and also includes some very interesting applications of IV when used to address non-compliance.

7.2 Seminar

7.2.1 Data

We will use one main dataset this week:

  1. Sesame Street data

7.2.2 Children’s Television and Educational Performance

Sesame Street

Sesame Street

Can educational television programmes improve children’s learning outcomes? Sesame Street is an American television programme aimed at young children. The creators of Sesame Street decided from the very beginning of the show’s production that a central goal would to be educate as well as entertain its audience. As Malcolm Gladwell argued, “Sesame Street was built around a single, breakthrough insight: that if you can hold the attention of children, you can educate them”. In addition to building the show around a carefully constructed educational curriculum, the show’s producers also worked closely with educational researchers to determine whether the show’s content was effectively improving its young viewers’ numeracy and literacy skills.

The dataset contained in sesame_experiment.dta includes information on 240 children who were randomly assigned to two groups. The treatment of interest here is watching Sesame Street, but clearly it is not possible to force children to watch a TV show or (perhaps even harder) to refrain from watching, and so watching the show cannot be randomized. Instead, in this study, researchers randomized whether children were encouraged to watch the show. More specifically, when the study was run in the 1970s, Sesame Street was on the air each day between 9am and 10am. The parents of children in the treatment group were encouraged to show Sesame Street to their children on a regular basis, while parents of the children in the control group were given no such encouragement. Because it is only encouragement that is randomized here, there is the possiblity of non-compliance – i.e. some children will not watch Sesame Street even though they are in the treatment condition, and some children will watch Sesame Street even though they are in the control condition.

The data includes the following variables:

  1. encour – 1 if the child was encouraged to watch Sesame Street, 0 otherwise
  2. watched – 1 if the child watched Sesame Street regularly, 0 otherwise
  3. letters – the score of the child on a literacy test
  4. age – age of the child (in months)
  5. sex – sex of the child (1 = male, 2 = female)

Question 1. Compliance and the intention-to-treat

a. In the context of this specific example, define the following unit types:

  1. Compliers
  2. Always-takers
  3. Never-takers
  4. Defiers

Solution

  1. The children who would watch Sesame Street only when encouraged to do so, and would not watch Sesame Street only when not encouraged
  2. The children who would watch Sesame Street regardless of encouragement
  3. The children who would not watch Sesame Street regardless of encouragement
  4. The children who would not watch Sesame Street only when encouraged to do so, and would watch Sesame Street only when not encouraged

The fourth type, the defiers, are assumed not to exist. This is the monotonicity assumption.

b. Calculate the proportion of children in the treatment group who did not watch Sesame Street. Calculate the proportion of children in the control group who did watch Sesame Street. What type of non-compliance occured in this experiment?

Solution

  # counts in each assignment/treatment group
  table(sesame$encouraged, sesame$watched)
   
      0   1
  0  40  48
  1  14 138
  # proportions in each assignment/treatment group
  prop.table(table(sesame$encouraged, sesame$watched),1)
   
             0          1
  0 0.45454545 0.54545455
  1 0.09210526 0.90789474

Of the 88 children assigned to the control condition, 48 actually watched Sesame Street.

Of the 152 children assigned to the treatment condition, 14 did not watch Sesame Street.

In addition to the fact that clearly Sesame Street was a very popular programme in the 1970s, this analysis tells us that we have two-sided non-compliance in this experiment. A number of treated units failed to take the assigned treatment, and a number of units took the treatment even though they were assigned to the control group.

c. Calculate the proportion of compliers in this experiment. Which assumptions are required for us to identify this quantity?

Solution

We can calculate the proportion of compliers via \(E[D_i|Z_i = 1] - E[D_i|Z_i = 0] = \bar{D}_{Z_i = 1} - \bar{D}_{Z_i = 0}\)

  d_z_1 <- mean(sesame$watched[sesame$encouraged == 1])
  d_z_0 <- mean(sesame$watched[sesame$encouraged == 0])
  proportion_compliers <- d_z_1 - d_z_0

  proportion_compliers
[1] 0.3624402

Roughly 36% of respondents in the sample are compliers.

We require 2 assumptions to identify the proportion of compliers.

  1. No defiers – we have to rule out any defiers from the sample
  2. Independence of the instrument – we assume that the instrument (\(Z_i\), whether a child was encouraged to watch Sesame Street) is randomly assigned. This assumption allows us to infer that the proportion of always-takers in the control group (something that is observable) is equal to the proportion of always-takers in the treatment group (something that is not observable)

d. Calculate the Intention-to-Treat effect (ITT). What is the interpretation of the ITT here?

Solution

We can calculate the ITT via \(E[Y_i|Z_i = 1] - E[Y_i|Z_i = 0] = \bar{Y}_{Z_i = 1} - \bar{Y}_{Z_i = 0}\)

  # Using the difference in means:
  y_z_1 <- mean(sesame$letters[sesame$encouraged == 1])
  y_z_0 <- mean(sesame$letters[sesame$encouraged == 0])
  itt <- y_z_1 - y_z_0

  itt
[1] 2.875598
  # Using OLS:
  summary(lm(letters ~ encouraged, data = sesame))

Call:
lm(formula = letters ~ encouraged, data = sesame)

Residuals:
    Min      1Q  Median      3Q     Max 
-24.920 -10.796  -4.796  12.423  38.080 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   24.920      1.421   17.54   <2e-16 ***
encouraged     2.876      1.786    1.61    0.109    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 13.33 on 238 degrees of freedom
Multiple R-squared:  0.01078,   Adjusted R-squared:  0.006623 
F-statistic: 2.593 on 1 and 238 DF,  p-value: 0.1086

The ITT estimate is equal to 2.876

The ITT estimates the causal effect of treatment assignment on the outcome of interest. Here, the ITT estimates the causal effect of being encouraged to watch Sesame Street on a child’s score on the literacy test. The estimate of 2.876 implies that this encouragement increases literacy scores by nearly 3 points, on average. Note however that this effect is not very precisely estimated (p = 0.109) and does not represent a substantively large effect (the standard deviation of the outcome variable here is a little over 13).

Question 2. Local Average Treatment Effect (LATE)

a. What does the LATE estimate?

Solution

The LATE estimates the average effect of the treatment on the outcome for those units in the sample who complied with the encouragement.

b. Estimate the LATE for this example. You should do this in three ways (all of which can be found on the lecture slides!):

  1. Using the Wald estimator
  2. Using “manual” two-stage least squares (i.e. you need to specify the regressions yourself)
  3. Using ivreg from the AER package. (You will need to install this package first using install.packages("AER"), and then load it using the library function)

Solution

  ## Wald Estimator
  itt/proportion_compliers
[1] 7.933993
  # Equivalently
  first_stage <- lm(watched ~ encouraged, data = sesame)
  reduced_form <- lm(letters ~ encouraged, data = sesame)
  coef(reduced_form)[2]/coef(first_stage)[2]
encouraged 
  7.933993 
  ## Two stage least squares (manual)
  first_stage <- lm(watched ~ encouraged, data = sesame)
  sesame$fitted_d <- predict(first_stage)
  second_stage <- lm(letters ~ fitted_d, data = sesame)
  coef(second_stage)[2]
fitted_d 
7.933993 
  ## Two stage least squares (IV reg)
  library(AER)
  tsls_ivreg <- ivreg(formula = letters ~ watched,
                      instruments = ~ encouraged,
                      data = sesame)

  summary(tsls_ivreg)

Call:
ivreg(formula = letters ~ watched | encouraged, data = sesame)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.593  -9.593  -4.527  10.723  34.473 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   20.593      3.659   5.628 5.11e-08 ***
watched        7.934      4.606   1.723   0.0863 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.46 on 238 degrees of freedom
Multiple R-Squared: 0.1355, Adjusted R-squared: 0.1318 
Wald test: 2.967 on 1 and 238 DF,  p-value: 0.08626 

The LATE estimate is equal to 7.934

The estimate tells us that the causal effect of watching Sesame Street for those children who complied with the encouragement increases a child’s score on the literacy test by nearly 9 points, on average. This estimate is much larger than the ITT, and (as evidenced by the standard error, t-value, and p-value from the ivreg summary) the effect is significantly different from zero and the 90% confidence level.

c. You have now estimated two treatment effects: the ITT and LATE. Which is of greater interest to the TV show’s producers?

Solution

The LATE seems like a much more valuable quantity of interest to the producers of Sesame Street than the ITT. Because the ITT combines information on both the extent to which the treatment was adhered to by the respondents, and the effect of the treatment itself on the outcome, it obscures clear conclusions about the effectiveness of Sesame Street as an educational programme.

By contrast, the LATE gives a very clear answer: it tells us that, for those children who complied with the encouragement to watch or not watch Sesame Street, the causal effect of watching the programme was to increase their literacy skills by 8 points on average. From the point of view of the TV producers, this is helpful information as it directly informs them about the educational impact of their show on those who watch. Of course, it may be the case that the compliers in this example are very different from the always-takers or never-takers, and so the generalizability of this result cannot be established from this single experiment.

Question 3. Exclusion restriction

What does the assumption of the exclusion restriction mean in this example? Are you convinced that the exclusion restriction holds here?

Solution

The exclusion restriction states that the instrument, Z, can only affect the outcome, Y, through its affect on the treatment, D. Here, this implies that for those children whose behaviour would not have been changed by the encouragement (i.e. never-takers and always-takers), there can be no effect of the encouragement on outcomes. In other words, there is no effect of encouragement on learning outcomes aside from when encouragement successfully prompts children to watch Sesame Street.

It seems likely that the exclusion restriction is a reasonable assumption in this setting. If the parents of a child are encouraged to sit their child in front of Sesame Street, it is difficult to think of a way that that assignment might affect their child’s literacy skills other than if they actually comply with the treatment.