9 Regression Discontinuity Designs

A regression discontinuity design (RDD, for short) arises when the selection of a unit into a treatment group depends on a covariate score that creates some discontinuity in the probability of receiving the treatment. In this lecture we will consider both “sharp” and “fuzzy” RDDs.

Regression discontinuity designs are appropriate in cases where the probability that a unit is treated jumps at some specific value (\(c\)) of a running variable (\(X_i\)). The RDD is powered by the idea that jumps in the outcome (\(Y\)) that occur at the same value of the running variable can – under certain assumptions – be attributed to the causal effect of the treatment. The basic logic behind this is that we normally expect most outcomes to evolve smoothly, and so as long as we are able to model that smooth evolution to a sufficient degree of accuracy, we can calculate the difference in outcomes that occur at the cutoff and count that as the treatment effect.

There are good treatments (get it?) of the RDD in both the Mostly Harmless book, and the Mastering ’Metrics books, but it probably is easiest to gain intuition about these methods from seeing plenty of examples. The paper we covered during the lecture is this one by Andy Hall, which uses a very common RD setting – election results which lead to one type of candidate being elected rather than another type – but in an innovative way that speaks to some important debates in American Politics. For an example of population-threshold based RDDs, this paper by Andy Eggers is very nice. Eggers uses the fact that municipalities in France use one electoral system (Majoritarian) when they have a population of less than 3500 and another (PR) when they have more than 3500 people. Finally, this paper is really nice, where Ferwerda and Miller implement a geographic-RDD to investigate the causal effects of the Vichy regime in World War Two on the strength of the French resistance (you could also read this paper in which Kocher and Monteiro (2016) challenge the “as if random” interpretation of Ferwerda and Miller’s RDD setting.)

9.1 Seminar

9.1.1 Islamic Political Rule and Women’s Empowerment - Meyerson (2014)

Does Islamic political control affect women’s empowerment? Many countries have seen Islamic parties coming to power through democratic elections in recent years, and due to strong support among religious conservatives, constituencies with Islamic rule often tend to exhibit poor women’s rights. Does this reflect a causal relationship? Erik Meyerson examines this question by using data on a set of Turkish municipalities from 1994 when an Islamic party won multiple municipal mayor seats across the country. We will use this data and implement a regression discontinuity design to compare municipalities where this Islamic party barely won or lost elections.

This week we will need the rdd package for estimating the regression discontinuity models, and the associated plotting functions. Install this package now and load it into your environment.

install.packages("rdd")
library(rdd)
library(ggplot2)
library(ggthemes)

You can load the data via:

islamic <- read.csv("data/islamic_women.csv")

The islamic_women.csv dataset includes the following variables:

margin – The margin of the Islamic party’s win or loss in the 1994 election (numbers greater than zero imply that the Islamic party won, and numbers less than zero imply that the Islamic party lost. 0 is an exact tie.)
school_men – the secondary school completion rate for men aged 15-20
school_women – the secondary school completion rate for women aged 15-20
log_pop – log of the municipality population in 1994
sex_ratio – sex ratio of the municipality in 1994
log_area – log of the municipality area in 1994

Estimating the LATE

Create a treatment variable within the data.frame islamic called islamic_win, that indicates whether or not the Islamic party won the 1994 election. In how many municipalities did the Islamic party win the election?

  # Create the treatment variable
  islamic$islamic_win <- as.numeric(islamic$margin > 0)
  
  # Tabulate the treatment variable
  table(islamic$islamic_win)

## 
##    0    1 
## 2332  328

The Islamic party won in 328 municipalities.

Calculate the difference in means in secondary school completion rates for women between regions where Islamic parties won and lost in 1994. Do you think this is a credible estimate of the causal effect of Islamic party control? Why or why not?

  # Create the treatment variable
diff_in_means <- mean(islamic$school_women[islamic$islamic_win==1],na.rm=T) - 
  mean(islamic$school_women[islamic$islamic_win==0],na.rm=T)

  diff_in_means

## [1] -0.02577901

The difference in means is -0.0258. A naive interpretation of this would be that Islamic party control leads to lower female education rates. But this is not a very credible estimate of the causal effect, due to selection bias. The treated and untreated regions are likely to have very different potential outcomes; they are imbalanced. For example, it is likely that places that strongly support Islamic parties are less supportive of female education to begin with.

Before estimating the RDD, we need to select an appropriate bandwidth for the analysis. You can select the optimal bandwidth for this data using the Imbens-Kalyanaram procedure, implemented using the IKBandwidth function in the rdd package. After you have selected the optimal bandwidth, explain what the bandwidth means in this case. Finally, using the bandwidth that you just calculated, create a new dataset that includes only those observations that fall within the optimal bandwidth of the running variable. Hint: You may need to read the help file (?IKbandwidth) to see what the function requires.

 band <- IKbandwidth(islamic$margin, islamic$school_women)
  band

## [1] 0.2398152

This gives a bandwidth of 0.239. This suggests that the optimal bias-variance trade-off is achieved for this specific RD analysis by only including municipalities where there vote share margin for the Islamic party was in the interval from -0.239 to 0.239.

 islamic_rdd  <- islamic[abs(islamic$margin) < band , ]

Find the Local Average Treatment Effect of Islamic party control on women’s secondary school education at the threshold, using the dataset you created in the question above and a simple linear regression that includes the treatment and running variable. How credible do you think this result is?

# model
linear_rdd_model <- lm(school_women ~ margin + islamic_win,
                       data=islamic_rdd)
# regression table
texreg::screenreg(linear_rdd_model)

## 
## ========================
##              Model 1    
## ------------------------
## (Intercept)     0.13 ***
##                (0.01)   
## margin         -0.22 ***
##                (0.04)   
## islamic_win     0.03 ** 
##                (0.01)   
## ------------------------
## R^2             0.03    
## Adj. R^2        0.02    
## Num. obs.    1020       
## ========================
## *** p < 0.001; ** p < 0.01; * p < 0.05

# plotting the relationship
## predicted values control
control <- data.frame(margin = seq(-round(band,2),0,.01), islamic_win = 0)
control$pred <- predict(linear_rdd_model, newdata = control)
## predicted values treated
treated <- data.frame(margin = seq(0,round(band,2),.01), islamic_win = 1)
treated$pred <- predict(linear_rdd_model, newdata = treated)
## combine  
pred <- rbind(control,treated)
pred$islamic_win <- as.factor(pred$islamic_win)
## plot
ggplot(islamic, aes(x = margin, y = school_women)) +
  geom_point(alpha=.2) + # observed data
  geom_vline(xintercept=0,linetype="dotted") + # vertical line at cutoff
  geom_line(data = pred, aes(y = pred, color = islamic_win), linewidth = 1) + # regression line on either side of cutoff
  geom_segment(x = 0, xend =  0, 
               y = pred$pred[pred$margin == 0 & pred$islamic_win == 0], yend = pred$pred[pred$margin == 0 & pred$islamic_win == 1],
               size = 1.5, color = "red") + # vertical line for LATE
  scale_y_continuous("Women Secondary School Completion Rate", limits = c(0,.6)) +
  scale_x_continuous("Islamic Party Election Margin",limits = c(-band,band)) +
  scale_colour_brewer("Islamic Party Win",palette = "Set2") +
  theme_clean() +
  lemon::coord_capped_cart(bottom="both",left="both") +
  theme(plot.background = element_rect(color=NA),
        panel.grid.major.y = element_blank(),
        legend.position = "bottom")

This gives a LATE estimate of 0.033 with a p value of 0.003, indicating that the effect is strongly statistically significant. This result may not be very credible because it requires that the relationship between female education and the running variable is linear and has the same slope on either side of the cutoff. This is a very strong assumption.

Use RD estimation to find the Local Average Treatment Effect of Islamic party control on women’s secondary school education at the threshold, using local linear regression estimated with the RDestimate function from the rdd package. Does the estimate differ from question 4?

The RDestimate function takes a number of arguments:

Argument	Description
`formula`	The `formula` of the RDD, which takes the form `outcome_variable ~ running_variable`
`data`	The name of the data of interest. In our case, we want to use `islamic`, not `islamic_rdd`. This is because the `RDestimate` function will apply the bandwidth restriction for us automatically when we specify the `bw` option as we do below.
`bw`	The bandwidth at which to estimate the RDD. Here, this argument should be `bw = band` where `band` is the optimal bandwidth you calculated above.
`cutpoint`	The cutpoint, i.e. the value of the running variable at which the disconuity is assumed to occur. Here, this argument should be 0.

  rd_est <- RDestimate(school_women~margin,
                       cutpoint=0, bw=band, data=islamic)
  summary(rd_est)

## 
## Call:
## RDestimate(formula = school_women ~ margin, data = islamic, cutpoint = 0, 
##     bw = band)
## 
## Type:
## sharp 
## 
## Estimates:
##            Bandwidth  Observations  Estimate  Std. Error  z value  Pr(>|z|)   
## LATE       0.2398     1020          0.02963   0.01241     2.388    0.01694   *
## Half-BW    0.1199      589          0.02502   0.01646     1.520    0.12859    
## Double-BW  0.4796     2050          0.02279   0.01009     2.257    0.02399   *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## F-statistics:
##            F       Num. DoF  Denom. DoF  p        
## LATE        4.990  3         1016        3.865e-03
## Half-BW     1.703  3          585        3.304e-01
## Double-BW  25.770  3         2046        4.441e-16

This gives a LATE estimate of 0.0296 with a p value of 0.016, indicating that the effect is clearly significant at the 5% level. The estimate is only slightly lower than in the manual version of RDD that we estimated above.

Plot the relationship between the running variable and the outcome, with a line representing the local linear regression either side of the cutoff. Use your plot to explain why your results do or do not differ strongly between the linear RDD model you estimated in question 4 and the non-linear RDD model you estimated in question 5. Note: You do not need to produce this plot manually! You can simply call the plot() function directly on the object you created in question 5. You can use the range argument to control the x axis. For more options, look at the help file for ?plot.RD.

  plot(rd_est, range=c(-0.6,0.6))
  abline(v=0)

The plot indicates that the relationship between \(Y\) and \(\tilde{X}\) does appear to be approximately linear on either side of the cutoff. That explains why we received similar answers in both questions 4 and 5 above: linearity is a reasonable fit to the data here. That will not always be the case, though, and it is usually good practice to show that you results are insensitive to the functional form specification of the running variable.

Assessing the robustness of RD estimates

Perform placebo tests to check that the relationship between the running variable and outcome is not subject to discontinuities at values other than zero. To do this, re-estimate the RD, but varying the placebo cutoffs. Try values ranging from -0.1 to 0.1 incremented by 0.01. What do you conclude? Hint: Rather than writing the code for each separate cut-off, you can use a for loop. On each iteration of the for loop, estimate the RDD using that specific cutoff, and save the LATE estimates and the associated confidence intervals. Once the for loop is complete, we can simply plot the estimates and confidence intervals over the range of the cutoffs we specify. If you get stuck, click below to see some code to help you get started.

Reveal Code Hint

## Set the vector of cut-off points
cutpoint <- seq(from = -0.1, to = 0.1, by = 0.01)

## Setup some variables that can be used to store the estimates and confidence intervals
est <- rep(NA, length(cut))
se <- rep(NA, length(cut))

## Loop over the cut-offs
for (i in 1:length(cut)){
  
  ## Estimate the RD model with the appropriate cut-off here (using cutpoint[i], for instance)
  
  ## Extract the LATE coefficient from the model here (if rd_est is the estimated model, the LATE coefficient can be found in rd_est$est[1]) and assign the estimated coefficient to the appropriate position in est 
  
  ## Extract the LATE standard error from the model here (if rd_est is the estimated model, the LATE standard error can be found in rd_est$se[1]) and assign the estimated standard error to the appropriate position in se

}

## create a data.frame object with, as variables, cut, est and se

## calculate the confidence intervals

# Doing this manually is tedious, as you can see:
# summary(RDestimate(school_women ~ margin, bw = band, cutpoint = -0.1, data = islamic))
# summary(RDestimate(school_women ~ margin, bw = band,cutpoint = -0.05, data = islamic))
# summary(RDestimate(school_women ~ margin, bw = band,cutpoint = 0.05, data = islamic))
# summary(RDestimate(school_women ~ margin, bw = band,cutpoint = 0.1, data = islamic))

# Let's use a loop instead!
## Set the vector of cut-off points
cut <- seq(from = -.1, to = 0.1, by = 0.01)

## Setup some variables that can be used to store the estimates and confidence intervals
est <- rep(NA, length(cut))
se <- rep(NA, length(cut))

for (i in 1:length(cut)){
  ## Estimate the RD model with the appropriate cut-off here (using cutpoint[i], for instance)
  rd_est_placebo <- RDestimate(school_women ~ margin,
                               cutpoint=cut[i], bw = band,
                               data = islamic)
  
     est[i] <- rd_est_placebo$est[1]
     se[i] <- rd_est_placebo$se[1]
}

## create a data.frame object with, as variables, cut, est and se
out <- data.frame(est,se,cut)
out$truecut <- out$cut==0

## calculate the confidence intervals
out$lo <- out$est - 1.96*out$se
out$hi <- out$est + 1.96*out$se

## plot
ggplot(out, aes(x= cut, y= est,color=truecut)) +
  geom_vline(xintercept = 0, linetype="dotted") +
  geom_point() +
  geom_linerange(aes(ymin=lo,ymax=hi)) +
  geom_hline(yintercept = 0) +
  scale_x_continuous("Cut point",breaks = seq(from = -.1, to = 0.1, by = 0.05)) +
  scale_y_continuous("Estimate",breaks = seq(from = -.06, to = 0.04, by = 0.02)) +
  scale_color_manual(values = c('black','red')) +
  theme_clean() +
  lemon::coord_capped_cart(bottom="both",left="both") +
  theme(plot.background = element_rect(color=NA),
        panel.grid.major.y = element_blank(),
        legend.position = "none",
        axis.ticks.length = unit(2,"mm"))

While there are no significant LATE estimates for the -0.05, 0.05 or 0.1 cutoffs, there is a significant (negative) treatment effect at the -0.1 cutoff. That is, the p-values for three of the cutoffs are large, while the p-value for the -0.1 cutoff is just 0.002. This is not hugely encouraging, as it suggests that there are significant discontinuities in the outcome variable for values of the running variable other than 0. This constitutes evidence against a true RD effect because it indicates that the discontinuity we measured at 0 could be only one of many discontinuities in the data. In other words, the effect at 0 could have arisen due to chance alone.

However, it is also noteworthy that the statistically significant effect estimated with cut-offs 0.1 and 0.09 is actually the other way around than the one at the ‘true’ cut-off zero. You can also see that in the plot below. Comparing this to the plot we got in the previous question, we see that those observations just below the (real) threshold (so where the islamic party narrowly lost) are the ones with surprisingly low share of women in school and are the ones pulling the estimated line downwards. One could surmise that candidates that won but narrowly against the islamic party got scared of their electoral success and over-compensated by trying to cater to a religious-conservative electorate which resulted in lower shares of women in school. Note that this is really just conjecture.

plot(RDestimate(school_women ~ margin, bw = band, cutpoint = -0.1, data = islamic),range = c(-0.6,0.6))

An alternative robustness check is to examine whether there are significant treatment effects for pre-treatment covariates that should not be affected by the treatment. Use the three pre-treatment covariates here – log_pop, sex_ratio, log_area – as outcome variables in new RD estimates. Do you find any significant effects of the treatment on these variables? Note: remember to set the bw argument to be equal to the bandwith that we selected in question 3 for all of these models.

rd_logpop <- RDestimate(log_pop ~ margin, bw = band, cutpoint = 0, data = islamic)
rd_sexratio <- RDestimate(sex_ratio ~ margin, bw = band, cutpoint = 0, data = islamic)
rd_logarea <- RDestimate(log_area ~ margin, bw = band, cutpoint = 0, data = islamic)

## You could just look at this with summary
# summary(rd_logpop)
# summary(rd_sexratio)
# summary(rd_logarea)

## But we're fancy, so we will put this in a nice table
library(tidyverse)
library(kableExtra)
out <- data.frame(var = c("log population","gender ratio","log area"),
                  est = c(rd_logpop$est[1],rd_sexratio$est[1],rd_logarea$est[1]),
                  se = c(rd_logpop$se[1],rd_sexratio$se[1],rd_logarea$se[1]),
                  p.value = c(rd_logpop$p[1],rd_sexratio$p[1],rd_logarea$p[1])) %>%
  mutate(across(where(is.numeric),~round(.,3))) %>%
  `colnames<-`(c("variable","estimate","standard error","p-value"))

kable(out, booktabs=T) %>% kable_styling(full_width = F)

variable	estimate	standard error	p-value
log population	0.080	0.220	0.715
gender ratio	-0.001	0.016	0.974
log area	0.071	0.178	0.689

There is no evidence of discontinuities for these any of these variables. They all have small placebo LATE estimates and high p-values. This suggests that, at least based on these observed covariates, there was no sorting around the threshold: local randomisation is likely to hold.

Perform a McCrary test (another way to check for sorting at the theshold) using the DCdensity function. Plot and interpret the results.

DCdensity(islamic$margin, verbose =TRUE)

## Assuming cutpoint of zero.
## Using calculated bin size:  0.009 
## Using calculated bandwidth:  0.165

## Log difference in heights is  -0.095  with SE  0.147 
##   this gives a z-stat of  -0.650 
##   and a p value of  0.515

## [1] 0.5153874

The p-value for the resultant test statistic is 0.51, meaning that we cannot reject the null hypothesis of no discontinuous jump in the density of the running variable. Visually, you can also see that there is no strong evidence for this in the plot.

Examine the sensitivity of the main RD result to the choice of bandwidth by calculating and plotting RD estimates and their associated 95% confidence intervals for a range of bandwidths from 0.05 to 0.6. To what extent do the results depend on the choice of bandwidth? Hint: Contrary to what the previous version of this seminar task said, you actually do not need a loop, as you can just add a verctor of bandwidths in the bw= argument in the RDestimate() function. Use the lecture slides to see how to get the results into a table you could use to plot the results.

## Set-up the vector of bandwidths
bandwidths <- seq(from = 0.05, to = 0.6, by = 0.005)

## RDD estimation
rd_est <- RDestimate(school_women ~ margin, bw = bandwidths, data = islamic)

## Create data frame
results <- data.frame(bw = bandwidths,
                      est = rd_est$est,
                      se = rd_est$se,
                      lo = rd_est$est - 1.96*rd_est$se,
                      hi = rd_est$est + 1.96*rd_est$se,
                      opt = bandwidths==round(band,2))

## ggplot
ggplot(results, aes(x= bw, y= est, color=opt)) +
  geom_linerange(aes(ymin=lo,ymax=hi),color="gray") +
  geom_point() +
  geom_hline(yintercept = 0, linetype="dotted") +
  scale_x_continuous("Bandwidth",breaks = seq(0.05,0.6,.05)) +
  scale_color_manual(values = c("black","red")) +
  ylab("Estimate") +
  theme_clean() +
  lemon::coord_capped_cart(bottom="both",left="both") +
  theme(plot.background = element_rect(color=NA),
        panel.grid.major.y = element_blank(),
        legend.position = "none",
        axis.ticks.length = unit(2,"mm"))

# BaseR
plot(x = bandwidths, y = results$est, 
       type = "l",
       lwd = 2,
       ylim = c(-0.1,0.1),
       xlab = "Bandwidth", ylab="Estimate")
  abline(h = 0)
  lines(x = bandwidths, y = results$hi, lty = 3)
  lines(x = bandwidths, y = results$lo, lty = 3)
  legend("topright", 
         legend = c("RD Estimate","95% Confidence Interval"),
         lty = c(1,3))

The plot shows a fairly typical pattern in RD analysis. At very low bandwidths, very little data is used in estimation, making the variance very high. Unsurprisingly therefore, the confidence interval is very wide at low bandwidths (recall that the optimal IK bandwidth is chosen with the twin goals of reducing both bias and variance). As the threshold (and therefore the effective sample size) increases, the confidence interval decreases.

The size of the RD estimate does not change much as the bandwidth increases, which is reassuring: we would not want our result to be extremely sensitive to the bandwidth choice. Its statistical significance does change somewhat as the threshold changes. It is significant at the 95% level for bandwidths between approximately 0.15 and 0.55, becoming statistically indistinguishable from 0 again at higher bandwidths. Again, this is reasurring as the main result holds across a wide range of bandwidths, and it would not be very defensible in the first place to do RD estimation with a bandwidth as high as 0.55, where we are using data that is very far indeed from the cutoff. Overall, if you saw this plot in a published paper you should feel reasurred that the author’s results are not driven primarily by the choice of threshold.

9.2 Quiz

Why is a valid regression discontinuity design (RDD) able to overcome the typical problem of selection bias in observational designs?

Because it effectively controls for all covariates (observable and unobservable) for all units in the sample, allowing to estimate an unbiased ATE
Because it effectively removes all heterogeneity between treatment and control groups and all time heterogeneity, allowing to recover an unbiased ATT
Because it compares observed outcomes for units just below and just above a cutoff on a running variable. In a valid design, treatment assignment just above/below the cutoff is as good as random
Because it uses a weighted average of most similar untreated units as a counterfactual for the treated unit, allowing to construct a credible synthetic counterfactual

What assumptions does a sharp RDD require?

That the running variable is inducing at least some variation in the treatment condition (1\(^{st}\) stage assumption)
That potential outcomes evolve smoothly around the cutoff. Hence: no covariate is discontinuous around the cutoff and units cannot self-sort on either side
The strength of a sharp RDD is precisely that it requires no assumption at all, hence its high internal validity
That outcomes of units just below and just above the threshold would have followed the same time-trend, hadn’t it been for the treatment

What causal quantity is a sharp RDD estimating?

A local average treatment effect (LATE), which is specific to units with value of the running variable close to the cutoff, within a certain bandwidth
A local average treatment effect (LATE), which averages effects for units that are geographically proximate to each other, within a certain spatial bandwidth
A local average treatment effect (LATE), that relates only to units who comply with treatment assignment
A local average treatment effect (LATE), an extrapolation specific to units that are located precisely on the cutoff

How does enlarging the bandwidth affect an RDD?

Enlarging the bandwidth decreases the number of observations, increases external validity, and increases statistical power
Enlarging the bandwidth increases the number of observations, decreases internal validity, and increases statistical power
It is not possible to enlarge the bandwidth once an optimal value is selected with the Imbens-Kalyanaraman algorithm
Enlarging the bandwidth increases the number of observations, decreases internal validity, and decreases statistical power

What is the procedure implemented by a fuzzy RDD?

Select covariates that are strongest in explaining the outcome variable. Give more weight to control units that are most similar to the treated unit with respect to them
Select, for each treated unit, the most similar untreated unit with respect to the included covariates. Keep only observations that satisfy this criterion and compute a difference in mean outcome
Restrict data within a bandwidth around the threshold. Code a treatment variable using the running variable. Fit a certain model (linear, multiplicative, polynomial, local linear). Take the coefficient of the treatment assignment as the estimated LATE
Restrict data within a bandwidth around the threshold. Code an instrument using the running variable. Proceed with a 2SLS procedure including the running variable in both stages. Take the coefficient of the fitted treatment assignment in the 2nd stage as the estimated LATE

Which of the following is NOT true about fuzzy RDD?

Units are assigned to treatment around the threshold, but not all units comply
The probability of treatment around the threshold is O on one side and 1 on the other
It requires only one identification assumption, the same as a sharp RDD
It combines regression discontinuity and instrumental variable estimation to overcome non-compliance with the treatment assignment at the threshold