9 Regression Discontinuity Designs
A regression discontinuity design (RDD, for short) arises when the selection of a unit into a treatment group depends on a covariate score that creates some discontinuity in the probability of receiving the treatment. In this lecture we will consider both “sharp” and “fuzzy” RDDs.
Regression discontinuity designs are appropriate in cases where the probability that a unit is treated jumps at some specific value (\(c\)) of a running variable (\(X_i\)). The RDD is powered by the idea that jumps in the outcome (\(Y\)) that occur at the same value of the running variable can – under certain assumptions – be attributed to the causal effect of the treatment. The basic logic behind this is that we normally expect most outcomes to evolve smoothly, and so as long as we are able to model that smooth evolution to a sufficient degree of accuracy, we can calculate the difference in outcomes that occur at the cutoff and count that as the treatment effect.
There are good treatments (get it?) of the RDD in both the Mostly Harmless book, and the Mastering ’Metrics books, but it probably is easiest to gain intuition about these methods from seeing plenty of examples. The paper we covered during the lecture is this one by Andy Hall, which uses a very common RD setting – election results which lead to one type of candidate being elected rather than another type – but in an innovative way that speaks to some important debates in American Politics. For an example of population-threshold based RDDs, this paper by Andy Eggers is very nice. Eggers uses the fact that municipalities in France use one electoral system (Majoritarian) when they have a population of less than 3500 and another (PR) when they have more than 3500 people. Finally, this paper is really nice, where Ferwerda and Miller implement a geographic-RDD to investigate the causal effects of the Vichy regime in World War Two on the strength of the French resistance (you could also read this paper in which Kocher and Monteiro (2016) challenge the “as if random” interpretation of Ferwerda and Miller’s RDD setting.)
9.1 Seminar
9.1.1 Islamic Political Rule and Women’s Empowerment - Meyerson (2014)
Does Islamic political control affect women’s empowerment? Many countries have seen Islamic parties coming to power through democratic elections in recent years, and due to strong support among religious conservatives, constituencies with Islamic rule often tend to exhibit poor women’s rights. Does this reflect a causal relationship? Erik Meyerson examines this question by using data on a set of Turkish municipalities from 1994 when an Islamic party won multiple municipal mayor seats across the country. We will use this data and implement a regression discontinuity design to compare municipalities where this Islamic party barely won or lost elections.
This week we will need the rdd
package for estimating the regression discontinuity models, and the associated plotting functions. Install this package now and load it into your environment.
You can load the data via:
The islamic_women.csv
dataset includes the following variables:
margin
– The margin of the Islamic party’s win or loss in the 1994 election (numbers greater than zero imply that the Islamic party won, and numbers less than zero imply that the Islamic party lost. 0 is an exact tie.)school_men
– the secondary school completion rate for men aged 15-20school_women
– the secondary school completion rate for women aged 15-20log_pop
– log of the municipality population in 1994sex_ratio
– sex ratio of the municipality in 1994log_area
– log of the municipality area in 1994
Estimating the LATE
- Create a treatment variable within the data.frame
islamic
calledislamic_win
, that indicates whether or not the Islamic party won the 1994 election. In how many municipalities did the Islamic party win the election?
- Calculate the difference in means in secondary school completion rates for women between regions where Islamic parties won and lost in 1994. Do you think this is a credible estimate of the causal effect of Islamic party control? Why or why not?
- Before estimating the RDD, we need to select an appropriate bandwidth for the analysis. You can select the optimal bandwidth for this data using the Imbens-Kalyanaram procedure, implemented using the
IKBandwidth
function in therdd
package. After you have selected the optimal bandwidth, explain what the bandwidth means in this case. Finally, using the bandwidth that you just calculated, create a new dataset that includes only those observations that fall within the optimal bandwidth of the running variable. Hint: You may need to read the help file (?IKbandwidth
) to see what the function requires.
- Find the Local Average Treatment Effect of Islamic party control on women’s secondary school education at the threshold, using the dataset you created in the question above and a simple linear regression that includes the treatment and running variable. How credible do you think this result is?
- Use RD estimation to find the Local Average Treatment Effect of Islamic party control on women’s secondary school education at the threshold, using local linear regression estimated with the
RDestimate
function from therdd
package. Does the estimate differ from question 4?
The RDestimate
function takes a number of arguments:
Argument | Description |
---|---|
formula |
The formula of the RDD, which takes the form outcome_variable ~ running_variable |
data |
The name of the data of interest. In our case, we want to use islamic , not islamic_rdd . This is because the RDestimate function will apply the bandwidth restriction for us automatically when we specify the bw option as we do below. |
bw |
The bandwidth at which to estimate the RDD. Here, this argument should be bw = band where band is the optimal bandwidth you calculated above. |
cutpoint |
The cutpoint, i.e. the value of the running variable at which the disconuity is assumed to occur. Here, this argument should be 0. |
- Plot the relationship between the running variable and the outcome, with a line representing the local linear regression either side of the cutoff. Use your plot to explain why your results do or do not differ strongly between the linear RDD model you estimated in question 4 and the non-linear RDD model you estimated in question 5. Note: You do not need to produce this plot manually! You can simply call the plot() function directly on the object you created in question 5. You can use the range argument to control the x axis. For more options, look at the help file for
?plot.RD
.
Assessing the robustness of RD estimates
- Perform placebo tests to check that the relationship between the running variable and outcome is not subject to discontinuities at values other than zero. To do this, re-estimate the RD, but varying the placebo cutoffs. Try values ranging from -0.1 to 0.1 incremented by 0.01. What do you conclude? Hint: Rather than writing the code for each separate cut-off, you can use a
for
loop. On each iteration of the for loop, estimate the RDD using that specific cutoff, and save the LATE estimates and the associated confidence intervals. Once the for loop is complete, we can simply plot the estimates and confidence intervals over the range of the cutoffs we specify. If you get stuck, click below to see some code to help you get started.
Reveal Code Hint
## Set the vector of cut-off points
cutpoint <- seq(from = -0.1, to = 0.1, by = 0.01)
## Setup some variables that can be used to store the estimates and confidence intervals
est <- rep(NA, length(cut))
se <- rep(NA, length(cut))
## Loop over the cut-offs
for (i in 1:length(cut)){
## Estimate the RD model with the appropriate cut-off here (using cutpoint[i], for instance)
## Extract the LATE coefficient from the model here (if rd_est is the estimated model, the LATE coefficient can be found in rd_est$est[1]) and assign the estimated coefficient to the appropriate position in est
## Extract the LATE standard error from the model here (if rd_est is the estimated model, the LATE standard error can be found in rd_est$se[1]) and assign the estimated standard error to the appropriate position in se
}
## create a data.frame object with, as variables, cut, est and se
## calculate the confidence intervals
- An alternative robustness check is to examine whether there are significant treatment effects for pre-treatment covariates that should not be affected by the treatment. Use the three pre-treatment covariates here –
log_pop
,sex_ratio
,log_area
– as outcome variables in new RD estimates. Do you find any significant effects of the treatment on these variables? Note: remember to set thebw
argument to be equal to the bandwith that we selected in question 3 for all of these models.
- Perform a McCrary test (another way to check for sorting at the theshold) using the
DCdensity
function. Plot and interpret the results.
- Examine the sensitivity of the main RD result to the choice of bandwidth by calculating and plotting RD estimates and their associated 95% confidence intervals for a range of bandwidths from 0.05 to 0.6. To what extent do the results depend on the choice of bandwidth? Hint: Contrary to what the previous version of this seminar task said, you actually do not need a loop, as you can just add a verctor of bandwidths in the
bw=
argument in theRDestimate()
function. Use the lecture slides to see how to get the results into a table you could use to plot the results.