6 Synthetic Control
Don’t have a good control unit to use in a difference-in-differences design? Don’t panic; just synthesise one. Synthetic control approaches allow for causal inferences based on similar assumptions to difference-in-differences, but are particularly well suited for situations in which the treatment occurs for a single unit. By providing a systematic way to choose comparison units, synthetic control is a good method for application to comparative case studies.
Synthetic control methods are a relatively new addition to the roster of causal inference techniques used in applied political science work. Because of this, there is no standard text-book treatment of these methods yet, at least that are we aware of. The best place to start reading therefore is this article by Abadie et al (2015) in the AJPS (you will recognise the Germany example from the lecture). The same authors also wrote a more detailed exposition of the method in this paper, which goes into more technical detail behind the estimation strategy. You can also consult this more recent paper by (the same) Abadie (2021)
For applications, the one we will focus on throughout the seminar is this paper by David Hope, who is at Kings College London. It may be inspirational to know that David started the project that resulted in this paper in a causal inference class very similar to the one you are currently taking! Another nice example of the synthetic control method is this paper by Benjamin Born and coauthors: in this study, the authors use synthetic control to estimate the costs of the Brexit vote to the UK economy. Finally, this very recent paper by Alrababa’h et al (2021) study the effect of football player Mohammed Salah’s coming to Liverpool F.C. on the incidence of hate crimes in the Liverpool area.
6.1 Seminar
The seminar this week is devoted to learning how to use the tidysynth
package in R
. This package has been developed to make it easier to implement synthetic control designs, though as you will see it does have a somewhat idiosyncratic coding style that is very step-by-step. You will need to install the package and load it as we have done in previous weeks:
# install.packages("tidysynth") # Remember that you only need to install the package once
library(tidysynth)
# You may also need the ggplot2 package for further plot customisation (which we all love!)
library(ggplot2)
6.1.1 The Effect of Economic and Monetary Union on Spain’s Current Account Balance – Hope (2016)
In early 2008, about a decade after the Euro was first introduced, the European Commission published a document looking back at the currency’s short history and concluded that the European Economic and Monetary Union was a “resounding success”. By the end of 2009 Europe was at the beginning of a multiyear sovereign debt crisis, in which several countries – including a number of Eurozone members – were unable to repay or refinance their government debt or to bail out over-indebted banks. Although the causes of the Eurocrisis were many and varied, one aspect of the pre-crisis era that became particularly damaging after 2008 were the large and persistent current account deficits of many member states. Current account imbalances – which capture the inflows and outflows of both goods and services and investment income – were a marked feature of the post-EMU, pre-crisis era, with many countries in the Eurozone running persistent current account deficits (indicating that they were net borrowers from the rest of the world). Large current account deficits make economies more vulnerable to external economic shocks because of the risk of a sudden stop in capital used to finance government deficits.
David Hope investigates the extent to which the introduction of the Economic and Monetary Union in 1999 was responsible for the current account imbalances that emerged in the 2000s. Using the sythetic control method, Hope evaluates the causal effect of EMU on current account balances in 11 countries between 1980 and 2010. In this exercise, we will focus on just one country – Spain – and evaluate the causal effect of joining EMU on the Spanish current account balance. Of the \(J\) countries in the sample, therefore, \(j = 1\) is Spain, and \(j=2,...,16\) will represent the “donor” pool of countries. In this case, the donor pool consists of 15 OECD countries that did not join the EMU: Australia, Canada, Chile, Denmark, Hungary, Israel, Japan, Korea, Mexico, New Zealand, Poland, Sweden, Turkey, the UK and the US.
The hope_emu.csv
file contains data on these 16 countries across the years 1980 to 2010. The data includes the following variables:
period
– the year of observationcountry_ID
– the country of observationcountry_no
– a numeric country identifierCAB
– current account balanceGDPPC_PPP
– GDP per capita, purchasing power adjustedinvest
– Total investment as a % of GDPgov_debt
– Government debt as a % of GDPopenness
– trade opennessdemand
– domestic demand growthx_price
– price level of exportsgov_deficit
– Government primary balance as a % of GDPcredit
– domestic credit to the private sector as a % of GDPGDP_gr
– GDP growth %
Use the read.csv
function to load the downloaded data into R now.
- Plot the trajectory of the Spanish current account balance over time (in red), compared to three other countries of your choice. Plot an additional dashed vertical line in 1999 to mark the introduction of the EMU. Would you be happy using any of them on their own as the control group?
- Preparing the synthetic control data:
Thetidysynth
package takes data in a somewhat cumbersome way. You first need to ‘explain’ toR
what the outcome variable, treatment and time identifier as well as the treated unit and start of treatment period are. This is done with the functionsynthetic_control()
. Use this to prepare theemu
data and store the result in a new object namedemu_synth
. The main arguments you will need to use are summarised in the table below. Try on your own first, and then look at the solution below.
Argument | Description |
---|---|
data |
This is where we put the data.frame that we want to use for the analysis |
outcome |
The name of the dependent variable in the analysis (here, "CAB" ) |
unit |
The name of the variable that identifies each unit |
time |
The name of the variable that identifies each time period (must be numeric) |
i_unit |
The identifying value of the treatment unit (must correspond to the value for the treated unit in unit ) |
i_time |
A vector indicating the start of the treatment period |
generate_placebos |
A logical value requesting that placebo versions of the data are created. Defaults to TRUE but, for now, set it to FALSE . |
Reveal answer
- Adding the predictor data:
In the next step, we need to specify which covariates should be considered in the calculation of the synthetic control weights for each donor unit. Remember, these covariates should be chosen according to whether they are predictive of the outcome. Further remember that the values need to be aggregated (usually averaged) across the entire pre-treatment period. The covariate are added with the functiongenerate_predictor()
which takes the arguments listed below. Again, try this yourself, and then have a look at the solution.
Argument | Description |
---|---|
data |
The name of the object created with synthetic_control() (here, emu_synth ) |
time_window |
The time window over which the data should be aggregated over. |
… | Each variable to be added with its respective aggregation formula. For instance, to add the mean of variable GDPPC_PPP write GDPPC_PPP = mean(GDPPC_PPP, na.rm =T) . Then add the next variable after a comma. |
Reveal answer
emu_synth <- generate_predictor(emu_synth, time_window = 1980:1998,
CAB = mean(CAB, na.rm=T),
GDPPC_PPP = mean(GDPPC_PPP, na.rm=T),
openness = mean(openness, na.rm=T),
demand = mean(demand, na.rm=T),
x_price = mean(x_price, na.rm=T),
GDP_gr = mean(GDP_gr, na.rm=T),
invest = mean(invest, na.rm=T),
gov_debt = mean(gov_debt, na.rm=T),
gov_deficit = mean(gov_deficit, na.rm=T),
credit = mean(credit, na.rm=T))
# note, the outcome variable should also be added as predictor!)
- Calculating the synthetic control weights:
With the data now prepared and ready, the weights for each donor unit can be estimated. This is done withgenerate_weights()
. There are a number of details of the estimation that can be fine tuned but we will simply use the defaults. The only arguments you will need are listed below. Note: It can take a few minutes for this function to run, so be patient!
Argument | Description |
---|---|
data |
The name of the object created with synthetic_control() and with the predictors added with generate_predictors() (here, emu_synth ) |
optimization_window |
The time window of the pre-treatment period to be used in the optimization task. Here we will use the entire pre-treatment period. |
- Calculating the synthetic control:
Now, we are finally ready to create the synthetic control unit (i.e. multiplying the outcomes of the donor units for each time period by their estimated weight and summing them up). This is done simply by runninggenerate_control()
on the object you created with the steps above.
- Plotting the results:
Use tidysynth’splot_trends()
andplot_differences()
functions on theemu_synth
object to produce plots which compare Spain’s actual current account balance trend to that of the synthetic Spain you have just created. Interpret these plots. What do they suggest about the effect of the introduction of EMU on the Spanish current account balance? Hint: These functions create plots in the style ofggplot2
so can be further customised as anyggplot
with+
.
- Interpreting the synthetic control unit:
A crucial strength of the synthetic control approach is that it allows us to be very transparent about the comparisons we are making when making causal inferences. In particular, we know that the synthetic Spain that we created in question 2 is a weighted average of the 15 OECD non-EMU countries in our data.
- What are the top five countries contributing to synthetic Spain? You can use the function
grab_unit_weights()
to extract the \(w\) weights.- Which variables contribute the most to the synthetic control? Is the synthetic control unit closer to the treated unit in terms of the covariates than the sample mean? You can use the function
grab_predictor_weights()
to extract the \(v\) weights andgrab_balance_table()
for a table of the predictor means for the treated, synthetic and donor units.
- Estimating a placebo synthetic control treatment effect:
One way to check the validity of the synthetic control is to estimate “placebo” effects – i.e. effects for units that were not exposed to the treatment. In this question we will replicate the analysis above for Australia, which did not join EMU in 1999.
- In constructing synthetic Australia, we must exclude Spain – the actual treatment unit – from the analysis. Before you repeat the steps above for Australia, create a new data.frame that doesn’t include the Spanish observations. Hint: Here you will want to select all rows of the
emu
data for which thecountry_ID
variable is not equal to"ESP"
.- Now repeat the steps above to estimate the synthetic control for Australia.
- What does the estimated treatment effect for Australia tell you about the validity of the design for estimating the treatment effect of the EMU on the Spanish current account balance?
- Compare the treatment effects from the Australian synthetic control analysis and the Spanish synthetic control analysis in terms of the pre- and post-treatment root mean square error values.
6.1.2 The Effect of Economic and Monetary Union on Austria’s Current Account Balance – Hope (2016)
In the full paper linked to on the reading list (and above), Hope conducts the synthetic control analysis for several countries, not just for Spain. One particularly interesting case that he evaluates is Austria. Your task is to replicate the analysis we have just completed but this time using Austria, and not Spain, as the unit of interest. You will notice that the data provided for the seminar does not include any information about Austria, but you can download an additional, part-completed, dataset from the “Seminar data 2” link at the top of this page.
You will also notice that this additional data is missing one crucial variable: the outcome. Because you do not have any outcome variable here for the new treated unit, you will need to collect this yourself. In Hope’s paper, he suggests that the data on each country’s current account balance (measured as a % of GDP) can be found in the IMF World Economic Outlook Database, October 2015. You should be able to find the relevant data at this link. Your first task is to retrieve this information for Austria for the years 1980 to 2010, and to include that information in the Austria data set.
Once you have found the data and entered it into your downloaded csv file, you need to load both the Austria data and the main dataset from the seminar and combine them. To do so, you could use the rbind
function that we introduced in last week’s seminar.
emu <- read.csv("data/hope_emu.csv")
austria <- read.csv("data/hope_emu_austria.csv")
austria$CAB <- c(...) # here you should enter the vector of CAB values
# Stack the data.frames on top of one another
emu_new <- rbind(emu, austria)
- Synthetic control for Austria:
You should now re-estimate the synthetic control method, this time using Austria as the unit of interest. Remember that you must ensure that you are not using Spain as a part of the donor pool. Then answer the following questions:
- Which 5 countries receive the highest weight as a part of synthetic Austria?
- Produce a plot which compares the current account balance of Austria to that of synthetic Austria. How does this compare to the equivalent plot of Spain?
- Retreive the vector of weights assigned to each of the variables used to construct synthetic Austria. Which variables contribute most to the synthetic control?
- Permutation inference:
Conduct full permutation inference by estimating placebo treatment effects for all of the control units, and comparing them to the actual estimated effect for Spain (and/or Austria). This means calculating the MSPE ratios for each unit, and provide a plot summarising these statistics. This is wheretidysynth
is really useful, meaning that rather than having to do the permutation inference by yourself by hand (i.e. calculate the placebo and the MSPE ratio for each donor unit like we did for Australia), you can simply re-do the steps from 1.2-1.5, only now you should specifygenerate_placebos = TRUE
in the first step, as shown when clicking on ‘Reveal answer’. Then, you can have a look at the results of the results of the permutation inference by looking at the pre and post MSPE, their ratio and the coresponding p-values by runninggrab_significance(emu_synth_placebo)
. Or, if you prefer to display these graphically (which is generally easier to digest), you can useplot_placebos()
as well asplot_mspe_ratio()
to explore whether the results we (and the paper) find for Spain are likely to have occured by chance.
Reveal answer
## prepare data
emu_synth_placebo <- synthetic_control(data = emu,
outcome = CAB,
unit = country_ID,
time = period,
i_unit = "ESP",
i_time = 1999,
generate_placebos = T)
## add predictors
emu_synth_placebo <- generate_predictor(emu_synth_placebo, time_window = 1980:1998,
CAB = mean(CAB, na.rm=T),
GDPPC_PPP = mean(GDPPC_PPP, na.rm=T),
openness = mean(openness, na.rm=T),
demand = mean(demand, na.rm=T),
x_price = mean(x_price, na.rm=T),
GDP_gr = mean(GDP_gr, na.rm=T),
invest = mean(invest, na.rm=T),
gov_debt = mean(gov_debt, na.rm=T),
gov_deficit = mean(gov_deficit, na.rm=T),
credit = mean(credit, na.rm=T))
## estimate weights
emu_synth_placebo <- generate_weights(emu_synth_placebo, optimization_window = 1980:1998)
## calculate SC
emu_synth_placebo <- generate_control(emu_synth_placebo)