# 5 Difference-in-Differences

## 5.1 Lecture review and reading

Chapter 5 in MHE gives a very good treatment of the main empirical issues associated with difference-in-differences analysis, and with panel data more generally. I also thought that the relevant chapter in Mastering ’Metrics was especially good this week, so both are worth consulting if you are thinking of applying this type of method in your own work.

The papers by Card (1990) and Card and Krueger (1994) are classics in the diff-in-diff literature, and give – I think – a very intuitive overview of the basics behind the method. More advanced applications can be found in the paper by Ladd and Lenz (2009), which also provides a useful demonstration of how difference-in-difference analyses can be combined with matching in order to strengthen the parallel trends assumption, and the recent paper by Dinas et al (2018), which we will replicate in part below.

For those of you who are feeling very committed, an important paper on statistical inference for difference-in-difference models is this one by Bertrand et. al. (2004). Be warned, however, that essentially no-one has ever enjoyed time spent reading a paper that is almost entirely about standard errors.

## 5.2 Seminar

This week we will be learning how to implement a variety of difference-in-differences estimators, using both linear regression and fixed-effects regressions. We will also spend time learning more about R’s plotting functions, as visually inspecting the data is one of the best ways of assessing the plausibility of the “parallel trends” assumption that is at the heart of all difference-in-differences analyses.

### 5.2.2 Refugees and support for the far right – Dinas et. al. (2018)

The recent refugee crisis in Europe has conincided with a period of electoral politics in which right-wing extremist parties have performed well in many European countries. However, despite this aggregate level correlation, we have surprisingly little causal evidence on the link between influxes of refugees, and the attitudes and behaviour of native populations. What is the causal relationship between refugee crises and support for far-right political parties? Dinas et. al. (2018) examine evidence from the Greek case. Making use of the fact that some Greek islands (those close to the Turkish border) witnessed sudden and unexpected increases in the number of refugees during the summer of 2015, while other nearby Greek islands saw much more moderate inflows of refugees, Dinas et. al. use a difference-in-differences analysis to assess whether treated municipalites were more supportive of the far-right Golden Dawn party in the September 2015 general election. We will examine the data from this paper, replicating the main parts of their difference-in-differences analysis.

The `dinas_golden_dawn.Rdata`

file contains data on 96 Greek municipalities, and 4 elections (2012, 2013, 2015, and the treatment year 2016). The `muni`

data.frame contained within that file includes the following variables:

`treatment`

– binary (1 if the observation is in the treatment group (a municipality that received many refugees)**and**the observation is in the post-treatment period (i.e. in 2016). Untreated units, and treatment units in the pre-treatment periods are coded as zero).`ever_treated`

– binary (equal to 1 in all periods for all treated municipalities, and equal to 0 in all periods for all control municipalities)`trarrprop`

– continuous (per capita number of refugees arriving in each municipality)`gdvote`

– the outcome of interest. The Golden Dawn’s share of the vote. (Continuous)`year`

– the year of the election. (Can take 4 values: 2012, 2013, 2015, and 2016)

Use the `load`

function to load the downloaded data into R now.

`load("dinas_golden_dawn.Rdata")`

**Question 1.** *Using only the observations from the post-treatment period (i.e. 2016), implement a regression which compares the Golden Dawn share of the vote for the treated and untreated municipalities. Does the coefficient on this regression represent the average treatment effect on the treated? If so, why? If not, why not?*

## Reveal answer

```
post_treatment_data <- muni[muni$year == 2016,]
post_treatment_period_regression <- lm(gdvote ~ treatment, data = post_treatment_data)
summary(post_treatment_period_regression)
```

```
Call:
lm(formula = gdvote ~ treatment, data = post_treatment_data)
Residuals:
Min 1Q Median 3Q Max
-5.659 -1.904 -0.091 1.426 9.109
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.6591 0.2513 22.517 < 2e-16 ***
treatment 2.7448 0.7109 3.861 0.000207 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.303 on 94 degrees of freedom
Multiple R-squared: 0.1369, Adjusted R-squared: 0.1277
F-statistic: 14.91 on 1 and 94 DF, p-value: 0.000207
```

The

`treatment`

variable is a dummy measuring 1 for treated observations in the post-treatment period, and zero otherwise. This means that the regression estimated above is simply the difference in means (for support for the Golden Dawn) between the treatment group (municipalities that witnessed large inflows of refugees) and the control group (municipalities that did not receive large numbers of refugees). The difference in means is positive, and significant: treated municipalities on average were 2-3 percentage points more supportive of the Golden Dawn than non-treated municipalities in the post-treatment period. Notice that for this simple analysis, you would have produced exactly the same result by using the`ever_treated`

variable rather than the`treatment`

variable.

However, because the treatment was not assigned at random, we have little reason to believe that this difference in means would identify the causal effect of interest. The treated and control municipalities might very well have different potential outcomes, and so selection bias is – as always – a concern.

**Question 2.** *Calculate the sample difference-in-differences between 2015 and 2016. For this question, you should calculate the relevant differences “manually”, in that you should use the mean function to construct the appropriate comparisons. What does this calculation imply about the average treatment effect on the treated?*

## Reveal answer

```
# Calculate the difference in means between treatment and control in the POST-treatment period
post_difference <- mean(muni$gdvote[muni$ever_treated == T & muni$year == 2016]) - mean(muni$gdvote[muni$ever_treated == F & muni$year == 2016])
# Calculate the difference in means between treatment and control in the PRE-treatment period
pre_difference <- mean(muni$gdvote[muni$ever_treated == T & muni$year == 2015]) - mean(muni$gdvote[muni$ever_treated == F & muni$year == 2015])
# Calculate the difference-in-differences
diff_in_diff <- post_difference - pre_difference
```

Note that because the

`treatment`

variable only indicates differences in treatment status during the post-treatment period, here we need to use the`ever_treated`

variable to define the difference in means in the pre-treatment period.The difference in means in the post-treatment period (2.74) is larger than the difference in means for the pre-treatment period (0.62), implying that the average treatment effect on the treatment municipalities is positive. In simple terms, the difference-in-difference implies that the refugee crisis increased support for the Golden Dawn amongst treated municipalities by rougly 2 percentage points, on average.

**Question 3.** *Use a linear regression with an appropriate interaction term to estimate the difference-in-differences. You will need to convert the year variable into an appropriate dummy variable (i.e. where observations in the post-treatment period are coded as 1 and observations in the pre-treatment period are coded as 0. For this question, you should again focus only on the years 2015 and 2016).*

## Reveal answer

```
## Subset the data to observations in either 2015 or 2016
muni_1516 <- muni[muni$year >= 2015,]
## Construct a dummy variable for the post-treatment period. Note that the way I have constructed the variable in R means it is stored as a logical vector (of TRUE and FALSE observations) rather than a numeric vector. R treats logical vectors as dummy variables, with TRUE being equal to 1 and FALSE being equal to 0.
muni_1516$post_treatment <- muni_1516$year == 2016
# Calculate the difference-in-differences
interaction_model <- lm(gdvote ~ ever_treated * post_treatment, data = muni_1516)
summary(interaction_model)
```

```
Call:
lm(formula = gdvote ~ ever_treated * post_treatment, data = muni_1516)
Residuals:
Min 1Q Median 3Q Max
-5.6591 -1.6784 -0.2008 1.3713 9.1088
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.3854 0.2415 18.162 < 2e-16
ever_treatedTRUE 0.6213 0.6829 0.910 0.364147
post_treatmentTRUE 1.2737 0.3415 3.730 0.000253
ever_treatedTRUE:post_treatmentTRUE 2.1236 0.9658 2.199 0.029120
(Intercept) ***
ever_treatedTRUE
post_treatmentTRUE ***
ever_treatedTRUE:post_treatmentTRUE *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.213 on 188 degrees of freedom
Multiple R-squared: 0.1762, Adjusted R-squared: 0.163
F-statistic: 13.4 on 3 and 188 DF, p-value: 5.795e-08
```

The regression analysis gives us, of course, the same answer as the difference in means that we calculated above. The coefficient on the interaction term between

`ever_treated`

and`post_treatment`

is 2.12. Fortunately, regression also provides us with standard errors, and we can see that the interaction term is statistically significant.

**Question 4.** *All difference-in-difference analyses rely on the “parallel trends” assumption. What does this assumption mean? What does it imply in this particular analysis?*

## Reveal answer

The parallel trends assumption requires us to believe that, in the absence of treatment, the treated and untreated units would have followed similar changes in the dependent variable. Another way of stating the assumption is that selection bias between treatment and control units must be stable over time (i.e. there can be no time varying confounders).

In this particular example, this assumption suggests that, in the absense of the refugee crisis, treated and control municipalities would have experienced similar changes in the level of support for the Golden Dawn in the 2016 election.

**Question 5.** *Assess the parallel trends assumption by plotting the evolution of the outcome variable for both the treatment and control observations over time. Are you convinced that the parallel trends assumption is reasonable in this application?*

## Reveal answer

There are a number of ways to calculate the average outcome for treated and control units in each time period, and then to plot them on a graph. The solution below uses the

`aggregate`

function, though it would also be possible to calculate each of the values manually and then store them in a data.frame for plotting.

```
group_period_averages <- aggregate(x = muni$gdvote,
by = list(muni$year, muni$ever_treated),
FUN = mean)
names(group_period_averages) <- c("year", "treated", "gdvote")
group_period_averages
```

```
year treated gdvote
1 2012 FALSE 5.491237
2 2013 FALSE 5.339475
3 2015 FALSE 4.385363
4 2016 FALSE 5.659097
5 2012 TRUE 6.159583
6 2013 TRUE 6.023365
7 2015 TRUE 5.006637
8 2016 TRUE 8.403935
```

Similarly, there are many ways to plot the values that you have estimated. You should use this question as a way of becoming familiar with the main

`plot`

function in R. Although we have used this function before, there are many options that you can specify to customise the graphics that you produce. I have included a few below, and have made comments next to the code to aid understanding. I have also used the`lines`

function below. This is similar to plot, and the main arguments it takes are an x and y vector.

```
plot(x = group_period_averages$year, # x specifies the variable plotted on the x-axis
y = group_period_averages$gdvote, # y specifies the variable plotted on the y-axis
col = ifelse(group_period_averages$treated, "red", "blue"), # col denotes the colour of the points
pch = 19, # pch determines the shape of the points in the plot
xlab = "Year", # x-axis label
ylab = "GD vote share", # y-axis label
main = "Parallel trends?") # Plot title
lines(x = group_period_averages$year[group_period_averages$treated == T],
y = group_period_averages$gdvote[group_period_averages$treated == T],
col = "red")
lines(x = group_period_averages$year[group_period_averages$treated == F],
y = group_period_averages$gdvote[group_period_averages$treated == F],
col = "blue")
```

The plot reveals that the parallel trends assumption seems very reasonable in this application. The vote share for the Golden Dawn evolves in parallel for both the treated and untreated municipalities throughout the three pre-treatment elections, and then diverges noticeably in the post-treatment period. This is encouraging, as it lends significant support to the crucial identifying assumption in the analysis.

**Question 6.** *Use a fixed-effects regression to estimate the difference-in-differences. Remember that the fixed-effect estimator for the diff-in-diff model requires “two-way” fixed-effects, i.e. sets of dummy variables for a) units and b) time periods. In R, you do not need to construct such dummy variables manually. It is sufficient to use the as.factor function within the lm function to tell R to treat a certain variable as a set of dummies. (So, here, try as.factor(municipality) and as.factor(year)). You will also need to decide which of the two treatment variables (treatment or ever_treated) is appropriate for this analysis (if you are struggling, look back at the lecture notes!)*

## Reveal answer

```
fixed_effect_model <- lm(gdvote ~ as.factor(municipality) + as.factor(year) + treatment,
data = muni)
summary(fixed_effect_model)
```

```
Call:
lm(formula = gdvote ~ as.factor(municipality) + as.factor(year) +
treatment, data = muni)
Residuals:
Min 1Q Median 3Q Max
-4.5964 -0.5206 -0.0090 0.4459 7.0013
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.847302 0.560514 8.648 3.93e-16 ***
as.factor(municipality)2 -0.960891 0.786565 -1.222 0.222861
as.factor(municipality)3 -1.254466 0.780395 -1.607 0.109062
as.factor(municipality)4 0.572188 0.780395 0.733 0.464039
as.factor(municipality)5 -1.154000 0.780395 -1.479 0.140319
as.factor(municipality)6 6.989895 0.780395 8.957 < 2e-16 ***
as.factor(municipality)7 2.736592 0.780395 3.507 0.000527 ***
as.factor(municipality)8 0.539248 0.780395 0.691 0.490134
as.factor(municipality)9 -1.826673 0.780395 -2.341 0.019940 *
as.factor(municipality)10 0.572991 0.780395 0.734 0.463413
as.factor(municipality)11 0.381896 0.780395 0.489 0.624963
as.factor(municipality)12 4.656487 0.780395 5.967 7.19e-09 ***
as.factor(municipality)13 -0.762112 0.780395 -0.977 0.329612
as.factor(municipality)14 -4.254392 0.780395 -5.452 1.09e-07 ***
as.factor(municipality)15 -0.043780 0.780395 -0.056 0.955302
as.factor(municipality)16 -2.366577 0.780395 -3.033 0.002649 **
as.factor(municipality)17 -1.978901 0.780395 -2.536 0.011756 *
as.factor(municipality)18 -2.973346 0.780395 -3.810 0.000170 ***
as.factor(municipality)19 2.337939 0.780395 2.996 0.002978 **
as.factor(municipality)20 -1.153299 0.780395 -1.478 0.140559
as.factor(municipality)21 4.016210 0.780395 5.146 4.96e-07 ***
as.factor(municipality)22 1.915856 0.780395 2.455 0.014689 *
as.factor(municipality)23 4.973244 0.780395 6.373 7.48e-10 ***
as.factor(municipality)24 1.208758 0.780395 1.549 0.122518
as.factor(municipality)25 -1.474112 0.780395 -1.889 0.059920 .
as.factor(municipality)26 1.427820 0.780395 1.830 0.068356 .
as.factor(municipality)27 3.489825 0.780395 4.472 1.12e-05 ***
as.factor(municipality)28 -1.678553 0.780395 -2.151 0.032328 *
as.factor(municipality)29 0.494295 0.780395 0.633 0.526989
as.factor(municipality)30 1.606365 0.780395 2.058 0.040464 *
as.factor(municipality)31 -1.859274 0.780395 -2.382 0.017855 *
as.factor(municipality)32 2.981003 0.780395 3.820 0.000164 ***
as.factor(municipality)33 -0.734275 0.786565 -0.934 0.351344
as.factor(municipality)34 -0.502034 0.780395 -0.643 0.520544
as.factor(municipality)35 2.125504 0.780395 2.724 0.006857 **
as.factor(municipality)36 5.227247 0.780395 6.698 1.13e-10 ***
as.factor(municipality)37 -0.377504 0.780395 -0.484 0.628947
as.factor(municipality)38 0.147724 0.780395 0.189 0.849998
as.factor(municipality)39 2.261062 0.780395 2.897 0.004057 **
as.factor(municipality)40 2.850464 0.780395 3.653 0.000309 ***
as.factor(municipality)41 -1.118071 0.780395 -1.433 0.153044
as.factor(municipality)42 0.847391 0.780395 1.086 0.278467
as.factor(municipality)43 1.096248 0.780395 1.405 0.161193
as.factor(municipality)44 -0.230902 0.780395 -0.296 0.767540
as.factor(municipality)45 1.241767 0.780395 1.591 0.112677
as.factor(municipality)46 2.854977 0.786565 3.630 0.000336 ***
as.factor(municipality)47 -0.429726 0.786565 -0.546 0.585267
as.factor(municipality)48 1.223751 0.786565 1.556 0.120865
as.factor(municipality)49 0.507119 0.786565 0.645 0.519625
as.factor(municipality)50 1.020418 0.780395 1.308 0.192079
as.factor(municipality)51 1.331549 0.780395 1.706 0.089055 .
as.factor(municipality)52 -1.452287 0.780395 -1.861 0.063783 .
as.factor(municipality)53 1.540263 0.780395 1.974 0.049385 *
as.factor(municipality)54 -0.468783 0.780395 -0.601 0.548520
as.factor(municipality)55 9.426776 0.786565 11.985 < 2e-16 ***
as.factor(municipality)56 -0.124964 0.780395 -0.160 0.872893
as.factor(municipality)57 -1.343525 0.780395 -1.722 0.086232 .
as.factor(municipality)58 2.217317 0.780395 2.841 0.004819 **
as.factor(municipality)59 -1.571326 0.780395 -2.013 0.045005 *
as.factor(municipality)60 1.346591 0.780395 1.726 0.085521 .
as.factor(municipality)61 -1.775431 0.780395 -2.275 0.023649 *
as.factor(municipality)62 -1.394833 0.780395 -1.787 0.074949 .
as.factor(municipality)63 -1.934248 0.780395 -2.479 0.013773 *
as.factor(municipality)64 1.349520 0.780395 1.729 0.084846 .
as.factor(municipality)65 1.226398 0.780395 1.572 0.117178
as.factor(municipality)66 0.611517 0.780395 0.784 0.433928
as.factor(municipality)67 1.123973 0.780395 1.440 0.150895
as.factor(municipality)68 3.912280 0.780395 5.013 9.44e-07 ***
as.factor(municipality)69 -0.348744 0.780395 -0.447 0.655302
as.factor(municipality)70 1.836924 0.780395 2.354 0.019262 *
as.factor(municipality)71 7.365582 0.780395 9.438 < 2e-16 ***
as.factor(municipality)72 1.251699 0.780395 1.604 0.109841
as.factor(municipality)73 2.309920 0.786565 2.937 0.003589 **
as.factor(municipality)74 0.395801 0.780395 0.507 0.612422
as.factor(municipality)75 -1.921374 0.780395 -2.462 0.014409 *
as.factor(municipality)76 -2.009183 0.780395 -2.575 0.010543 *
as.factor(municipality)77 0.130021 0.780395 0.167 0.867796
as.factor(municipality)78 5.006388 0.780395 6.415 5.86e-10 ***
as.factor(municipality)79 -0.354066 0.780395 -0.454 0.650391
as.factor(municipality)80 2.748890 0.780395 3.522 0.000498 ***
as.factor(municipality)81 1.456558 0.780395 1.866 0.063011 .
as.factor(municipality)82 2.236038 0.786565 2.843 0.004796 **
as.factor(municipality)83 0.199033 0.780395 0.255 0.798875
as.factor(municipality)84 0.819445 0.780395 1.050 0.294594
as.factor(municipality)85 -0.001966 0.786565 -0.002 0.998008
as.factor(municipality)86 -1.582262 0.780395 -2.028 0.043544 *
as.factor(municipality)87 2.851889 0.780395 3.654 0.000307 ***
as.factor(municipality)88 -0.632180 0.780395 -0.810 0.418574
as.factor(municipality)89 -2.077799 0.780395 -2.662 0.008199 **
as.factor(municipality)90 -2.683354 0.780395 -3.438 0.000673 ***
as.factor(municipality)91 -0.706463 0.786565 -0.898 0.369860
as.factor(municipality)92 4.190118 0.780395 5.369 1.65e-07 ***
as.factor(municipality)93 0.378094 0.780395 0.484 0.628411
as.factor(municipality)94 -0.367786 0.780395 -0.471 0.637801
as.factor(municipality)95 -0.088245 0.786565 -0.112 0.910752
as.factor(municipality)96 0.242690 0.780395 0.311 0.756041
as.factor(year)2013 -0.149819 0.159298 -0.940 0.347762
as.factor(year)2015 -1.111758 0.159298 -6.979 2.10e-11 ***
as.factor(year)2016 0.166546 0.166711 0.999 0.318639
treatment 2.087002 0.393282 5.307 2.25e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.104 on 284 degrees of freedom
Multiple R-squared: 0.8663, Adjusted R-squared: 0.8197
F-statistic: 18.58 on 99 and 284 DF, p-value: < 2.2e-16
```

The key coefficient in this output is the one associated with the

`treatment`

indicator. Given the fixed-effects, this coefficient represents the difference-in-differences that is our focus. This model indicates an ATT of 2.087, which is very close to our estimates from the questions above (the small differences can be accounted for by the fact that we are now usingallthe pre-treatment periods, rather than just 2015).

**Question 7.** *Using the same model that you implemented in question 6, swap the treatment variable for the trarrprop variable, which is a continuous treatment variable measuring the number of refugee arrivals per capita. What is the estimated average treatment effect on the treated using this variable?*

## Reveal answer

```
fixed_effect_model_2 <- lm(gdvote ~ as.factor(municipality) + as.factor(year) + trarrprop,
data = muni)
summary(fixed_effect_model_2)
```

```
Call:
lm(formula = gdvote ~ as.factor(municipality) + as.factor(year) +
trarrprop, data = muni)
Residuals:
Min 1Q Median 3Q Max
-4.5843 -0.5275 0.0154 0.4563 7.0134
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.83519 0.56722 8.524 9.34e-16 ***
as.factor(municipality)2 -1.19678 0.80692 -1.483 0.139151
as.factor(municipality)3 -1.25447 0.78976 -1.588 0.113309
as.factor(municipality)4 0.57219 0.78976 0.725 0.469352
as.factor(municipality)5 -1.15400 0.78976 -1.461 0.145069
as.factor(municipality)6 6.98989 0.78976 8.851 < 2e-16 ***
as.factor(municipality)7 2.73659 0.78976 3.465 0.000612 ***
as.factor(municipality)8 0.53925 0.78976 0.683 0.495292
as.factor(municipality)9 -1.82667 0.78976 -2.313 0.021443 *
as.factor(municipality)10 0.57299 0.78976 0.726 0.468729
as.factor(municipality)11 0.38190 0.78976 0.484 0.629072
as.factor(municipality)12 4.65649 0.78976 5.896 1.06e-08 ***
as.factor(municipality)13 -0.76211 0.78976 -0.965 0.335372
as.factor(municipality)14 -4.25439 0.78976 -5.387 1.51e-07 ***
as.factor(municipality)15 -0.04378 0.78976 -0.055 0.955831
as.factor(municipality)16 -2.36658 0.78976 -2.997 0.002972 **
as.factor(municipality)17 -1.97890 0.78976 -2.506 0.012783 *
as.factor(municipality)18 -2.97335 0.78976 -3.765 0.000203 ***
as.factor(municipality)19 2.33794 0.78976 2.960 0.003333 **
as.factor(municipality)20 -1.15330 0.78976 -1.460 0.145312
as.factor(municipality)21 4.01621 0.78976 5.085 6.69e-07 ***
as.factor(municipality)22 1.91586 0.78976 2.426 0.015897 *
as.factor(municipality)23 4.97324 0.78976 6.297 1.15e-09 ***
as.factor(municipality)24 1.20876 0.78976 1.531 0.127001
as.factor(municipality)25 -1.47411 0.78976 -1.867 0.063001 .
as.factor(municipality)26 1.42782 0.78976 1.808 0.071681 .
as.factor(municipality)27 3.48982 0.78976 4.419 1.41e-05 ***
as.factor(municipality)28 -1.67855 0.78976 -2.125 0.034420 *
as.factor(municipality)29 0.49429 0.78976 0.626 0.531898
as.factor(municipality)30 1.60636 0.78976 2.034 0.042885 *
as.factor(municipality)31 -1.85927 0.78976 -2.354 0.019245 *
as.factor(municipality)32 2.98100 0.78976 3.775 0.000195 ***
as.factor(municipality)33 -0.34451 0.79029 -0.436 0.663221
as.factor(municipality)34 -0.50203 0.78976 -0.636 0.525499
as.factor(municipality)35 2.12550 0.78976 2.691 0.007540 **
as.factor(municipality)36 5.22725 0.78976 6.619 1.81e-10 ***
as.factor(municipality)37 -0.37750 0.78976 -0.478 0.633020
as.factor(municipality)38 0.14772 0.78976 0.187 0.851756
as.factor(municipality)39 2.26106 0.78976 2.863 0.004511 **
as.factor(municipality)40 2.85046 0.78976 3.609 0.000363 ***
as.factor(municipality)41 -1.11807 0.78976 -1.416 0.157960
as.factor(municipality)42 0.84739 0.78976 1.073 0.284197
as.factor(municipality)43 1.09625 0.78976 1.388 0.166205
as.factor(municipality)44 -0.23090 0.78976 -0.292 0.770218
as.factor(municipality)45 1.24177 0.78976 1.572 0.116990
as.factor(municipality)46 2.99804 0.79408 3.775 0.000195 ***
as.factor(municipality)47 -0.25789 0.85375 -0.302 0.762821
as.factor(municipality)48 1.34803 0.79452 1.697 0.090860 .
as.factor(municipality)49 0.34916 0.80360 0.434 0.664263
as.factor(municipality)50 1.02042 0.78976 1.292 0.197390
as.factor(municipality)51 1.33155 0.78976 1.686 0.092894 .
as.factor(municipality)52 -1.45229 0.78976 -1.839 0.066978 .
as.factor(municipality)53 1.54026 0.78976 1.950 0.052128 .
as.factor(municipality)54 -0.46878 0.78976 -0.594 0.553270
as.factor(municipality)55 9.44871 0.79728 11.851 < 2e-16 ***
as.factor(municipality)56 -0.12496 0.78976 -0.158 0.874388
as.factor(municipality)57 -1.34352 0.78976 -1.701 0.090007 .
as.factor(municipality)58 2.21732 0.78976 2.808 0.005338 **
as.factor(municipality)59 -1.57133 0.78976 -1.990 0.047595 *
as.factor(municipality)60 1.34659 0.78976 1.705 0.089280 .
as.factor(municipality)61 -1.77543 0.78976 -2.248 0.025342 *
as.factor(municipality)62 -1.39483 0.78976 -1.766 0.078449 .
as.factor(municipality)63 -1.93425 0.78976 -2.449 0.014926 *
as.factor(municipality)64 1.34952 0.78976 1.709 0.088589 .
as.factor(municipality)65 1.22640 0.78976 1.553 0.121571
as.factor(municipality)66 0.61152 0.78976 0.774 0.439395
as.factor(municipality)67 1.12397 0.78976 1.423 0.155784
as.factor(municipality)68 3.91228 0.78976 4.954 1.25e-06 ***
as.factor(municipality)69 -0.34874 0.78976 -0.442 0.659129
as.factor(municipality)70 1.83692 0.78976 2.326 0.020729 *
as.factor(municipality)71 7.36558 0.78976 9.326 < 2e-16 ***
as.factor(municipality)72 1.25170 0.78976 1.585 0.114103
as.factor(municipality)73 2.45402 0.79406 3.090 0.002197 **
as.factor(municipality)74 0.39580 0.78976 0.501 0.616643
as.factor(municipality)75 -1.92137 0.78976 -2.433 0.015600 *
as.factor(municipality)76 -2.00918 0.78976 -2.544 0.011490 *
as.factor(municipality)77 0.13002 0.78976 0.165 0.869350
as.factor(municipality)78 5.00639 0.78976 6.339 9.09e-10 ***
as.factor(municipality)79 -0.35407 0.78976 -0.448 0.654265
as.factor(municipality)80 2.74889 0.78976 3.481 0.000579 ***
as.factor(municipality)81 1.45656 0.78976 1.844 0.066184 .
as.factor(municipality)82 2.34503 0.79489 2.950 0.003442 **
as.factor(municipality)83 0.19903 0.78976 0.252 0.801210
as.factor(municipality)84 0.81944 0.78976 1.038 0.300348
as.factor(municipality)85 0.27779 0.79153 0.351 0.725886
as.factor(municipality)86 -1.58226 0.78976 -2.003 0.046080 *
as.factor(municipality)87 2.85189 0.78976 3.611 0.000361 ***
as.factor(municipality)88 -0.63218 0.78976 -0.800 0.424109
as.factor(municipality)89 -2.07780 0.78976 -2.631 0.008982 **
as.factor(municipality)90 -2.68335 0.78976 -3.398 0.000777 ***
as.factor(municipality)91 -0.79228 0.80084 -0.989 0.323354
as.factor(municipality)92 4.19012 0.78976 5.306 2.27e-07 ***
as.factor(municipality)93 0.37809 0.78976 0.479 0.632489
as.factor(municipality)94 -0.36779 0.78976 -0.466 0.641793
as.factor(municipality)95 0.17017 0.79185 0.215 0.829999
as.factor(municipality)96 0.24269 0.78976 0.307 0.758844
as.factor(year)2013 -0.14982 0.16121 -0.929 0.353502
as.factor(year)2015 -1.11176 0.16121 -6.896 3.48e-11 ***
as.factor(year)2016 0.21498 0.16757 1.283 0.200560
trarrprop 0.60611 0.13244 4.577 7.08e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.117 on 283 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.8635, Adjusted R-squared: 0.8157
F-statistic: 18.08 on 99 and 283 DF, p-value: < 2.2e-16
```

Note that there is no difficulty in incorporating a continuous treatment variable into this analysis. The parallel trends assumption remains the same – that treated and control groups would have followed the same dynamics in the absense of the treatment, but here we can just interpret the treatment has having different

intensitiesfor difference units.The coefficient on the

`trarrprop`

variable implies that the arrival of each additional refugee per resident increases the Golden Dawn’s vote share 0.6 percentage points on average.

## 5.3 Homework

### 5.3.1 Minimum wages and employment – Card and Krueger (1994)

On April 1, 1992, the minimum wage in New Jersey was raised from $4.25 to $5.05. In the neighboring state of Pennsylvania, however, the minimum wage remained constant at $4.25. David Card and Alan Krueger (1994) analyze the impact of the minimum wage increase on employment in the fast–food industry, since this is a sector which employs many low-wage workers.

The authors collected data on the number of employees in 331 fast–food restaurants in New Jersey and 79 in Pennsylvania. The survey was conducted in February 1992 (before the minimum wage was raised) and in November 1992 (after the minimum wage was raised). The table below shows the average number of employees per restaurant:

February 1992 | November 1992 | |
---|---|---|

New Jersey |
17.1 | 17.6 |

Pennsylvania |
19.9 | 17.5 |

**Question 1.** *Using only the figures given in the table above, explain three possible ways to estimate the causal effect of the minimum wage increase on employment. For each appraoch, discuss which assumptions have to be made and what could bias the result.*

## Solution

Difference in means between the treatment and control groups in the post-treatment period. Assumption: no selection bias.

Difference in means for the treatment group in the pre- and post-treatment periods. Assumption: no effect of time independent of treatment.

Difference in differences. Assumption: Trend over time in the absence of the treatment is the same for treatment and control groups (parallel trends).

**Question 2.** *Replication exercise*

The dataset `m_wage.dta`

that you downloaded earlier includes the information necessary to replicate the Card and Krueger analysis. In contrast to the Dinas data, the dataset here is stored in a “wide” format, i.e. there is a single row for each unit (restaurant), and different columns for the outcomes and covariates in different years. The dataset includes the following variables (as well as some others which we will not use):

`nj`

– a dummy variable equal to 1 if the restaurant is located in New Jersey`emptot`

– the total number of full-time employed people in the pre-treatment period`emptot2`

– the total number of full-time employed people in the post-treatment period`wage_st`

– a variable measuring the average starting wage in the restaurant in the pre-treatment period`wage_st2`

– a variable measuring the average starting wage in the restaurant in the post-treatment period`pmeal`

– a variable measuring the average price of a meal in the pre-treatment period`pmeal2`

– a variable measuring the average price of a meal in the post-treatment period`co_owned`

– a dummy variable equal to 1 if the restaurant was co-owned`bk`

– a dummy variable equal to 1 if the restaurant was a Burger King`kfc`

– a dummy variable equal to 1 if the restaurant was a KFC`wendys`

– a dummy variable equal to 1 if the restaurant was a Wendys

You will need to load the `read.dta`

function in the `foreign`

package (call `library(foreign)`

before trying to call `read.dta`

) to access this data.

```
library(foreign)
min_wage <- read.dta("m_wage.dta")
```

*A)* Calculate the difference-in-difference estimate for the *average wage* in NJ and PA. Noting that the wage is not the outcome of interest in this case, what does this analysis suggest about the effectiveness of the minimum-wage policy?

Note that there are some observations with missing data in this exercise (these are coded as `NA`

in the data). You can calculate the mean of a vector with missing values by setting the `na.rm`

argument to be equal to `TRUE`

in the `mean`

function.

## Solution

```
post_treatment_difference <- mean(min_wage$wage_st2[min_wage$nj ==1], na.rm = T) - mean(min_wage$wage_st2[min_wage$nj ==0], na.rm = T)
pre_treatment_difference <- mean(min_wage$wage_st[min_wage$nj ==1], na.rm = T) - mean(min_wage$wage_st[min_wage$nj ==0], na.rm = T)
difference_in_difference <- post_treatment_difference - pre_treatment_difference
difference_in_difference
```

`[1] 0.4813823`

The average starting wage rates are clearly very similar in New Jersey and Pennsylvania before the change in the minimum wage for New Jersey. The pre-treatment difference in means is about 2 cents, whereas the post-treatment difference in means is nearly 50 cents. This suggests that the minimum wage increase was successfully adopted in most NJ restaurants.

*B)* Calculate the difference-in-differences estimator for the outcome of interest (the number of full-time employees). Under what conditions does this estimate identify the average treatment effect on the treated? What evidence do you have to support or refute these conditions here?

## Solution

```
post_treatment_difference <- mean(min_wage$emptot2[min_wage$nj ==1], na.rm = T) - mean(min_wage$emptot2[min_wage$nj ==0], na.rm = T)
pre_treatment_difference <- mean(min_wage$emptot[min_wage$nj ==1], na.rm = T) - mean(min_wage$emptot[min_wage$nj ==0], na.rm = T)
difference_in_difference <- post_treatment_difference - pre_treatment_difference
difference_in_difference
```

`[1] 2.753606`

The pre-treatment difference in means shows that New Jersey restaurants were initially significantly smaller in terms of full time employees in employment than Pennsylvanian restaurants, but the post-treatment period calculation suggests that this difference in FTE employment rates between NJ and PA stores had disappeared after the application of the treatment. Accordingly, the difference-in-differences (i.e. the difference in changes in FTE employment) is positive, and indicates that NJ stores had a relative increase in FTE employment compared to PA stores.

Of course, the crucial identifying assumption is that NJ and PA restaurants would have followed parallel employment trends in the absence of the minimum wage change. This assumption is very difficult to assess with the data we have here, as we do not have any information on employment outcomes for prior periods.

*C)* Calculate the difference-in-differences estimator for the price of an average meal. Do restaurants that were subject to a wage increase raise their prices for fast–food?

## Solution

```
post_treatment_difference <- mean(min_wage$pmeal2[min_wage$nj ==1], na.rm = T) - mean(min_wage$pmeal2[min_wage$nj ==0], na.rm = T)
pre_treatment_difference <- mean(min_wage$pmeal[min_wage$nj ==1], na.rm = T) - mean(min_wage$pmeal[min_wage$nj ==0], na.rm = T)
difference_in_difference <- post_treatment_difference - pre_treatment_difference
difference_in_difference
```

The difference-in-differences estimate suggests that there is a very small increase (about 8 cents) in the average price of a meal after the introduction of the new minimum wage in NJ. It does not appear to be the case that the costs of the minimum wage increase were passed on to consumers.

**Question 3. (Difficult)** *Convert the dataset from a “wide” format to a “long” format (i.e. where you have two observations for each restaurant, and an indicator for the time period in which the restaurant was observed). Estimate the difference-in-differences using linear regression. You should run two models: one which only includes the relevant variables to estimate the diff-in-diff, and one which additionally includes restaurant-level covariates which do not vary over time. Do your estimates of the treatment effect differ?*

Note: The easiest way to acheive the data conversion is to notice that you can simply “stack” one data.frame (with information from the pre-treatment period) on top of another data.frame (with information from the post-treatment period). So, first create two data.frames with the relevant variables. Second, bind these two data.frames together using the `rbind()`

function (the data.frames must have the same column names before they are joined). Note that you will have to create the relevant treatment period indicator before binding the data.frames together.

## Solution

```
## Create two data.frames (one for pre-treatment and one for post-treatment period observations)
min_wage_feb <- min_wage[,c("nj","wage_st","emptot","kfc", "wendys","co_owned")]
min_wage_nov <- min_wage[,c("nj","wage_st2","emptot2","kfc", "wendys","co_owned")]
## Create a treatment period indicator
min_wage_feb$treatment_period <- 0
min_wage_nov$treatment_period <- 1
## Make sure the two data.frames have the same column names
colnames(min_wage_nov) <- colnames(min_wage_feb)
## Stack the data.frames on top of one another
min_wage_long <- rbind(min_wage_feb, min_wage_nov)
## Estimate the simple diff-in-diff
summary(lm(emptot ~ nj * treatment_period, min_wage_long))
```

```
Call:
lm(formula = emptot ~ nj * treatment_period, data = min_wage_long)
Residuals:
Min 1Q Median 3Q Max
-21.166 -6.439 -1.027 4.473 64.561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.331 1.072 21.767 <2e-16 ***
nj -2.892 1.194 -2.423 0.0156 *
treatment_period -2.166 1.516 -1.429 0.1535
nj:treatment_period 2.754 1.688 1.631 0.1033
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.406 on 790 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.007401, Adjusted R-squared: 0.003632
F-statistic: 1.964 on 3 and 790 DF, p-value: 0.118
```

```
## Estimate the covariate adjusted diff-in-diff
summary(lm(emptot ~ nj * treatment_period + kfc + wendys + co_owned, min_wage_long))
```

```
Call:
lm(formula = emptot ~ nj * treatment_period + kfc + wendys +
co_owned, data = min_wage_long)
Residuals:
Min 1Q Median 3Q Max
-23.799 -5.095 -1.139 3.349 63.605
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.6099 1.0241 25.008 < 2e-16 ***
nj -2.4416 1.0804 -2.260 0.02410 *
treatment_period -2.2261 1.3699 -1.625 0.10456
kfc -9.7866 0.7734 -12.653 < 2e-16 ***
wendys -0.5579 0.8912 -0.626 0.53153
co_owned -1.7733 0.6417 -2.763 0.00586 **
nj:treatment_period 2.8564 1.5258 1.872 0.06157 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.498 on 787 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.1928, Adjusted R-squared: 0.1866
F-statistic: 31.32 on 6 and 787 DF, p-value: < 2.2e-16
```

There is very little difference between the regression adjusted estimate and the raw diff-in-diff estimate.