2 Randomized Experiments
This week we will review the logic that underpins a research design that has become a mainstay of political science research: randomised experiments. We will focus on why randomisation is such a powerful force for making causal inferences (spoiler: internal validity), and will discuss the tradeoffs implicit in experimental research (spoiler: external validity). In learning how to analyse experimental data, we will review the ttest and also cover regression as a tool for analysing experiments.
The main motivation for using randomized experiments is that when treatments are randomly assigned, both observed and unobserved potentially confounding factors will be balanced across treatment and control conditions in expectation. That is, randomization solves the selection bias problem that we outlined last week. The intuition behind this result is nicely described in both the Mostly Harmless book (Chapter 2), and also in the Mastering ’Metrics book (see, in particular, Chapter 1). As Angrist and Pischke put it, “Random assignment works not by eliminating individual differences but rather by ensuring that the mix of individuals being compared is the same.” (MM, p. 16)
For a review of statistical inference (sampling distributions, ttests, standard errors, etc) the Masterin’ Metrics book has a nice appendix on pages 3346. The Gerber and Green book (chapter 3) is also very useful, and very clear, though note it pays a good deal of attention to randomization inference (which we do not cover on this course), rather than classical statistical inference methods.
Chattopadhyay and Duflo (2004) is an excellent example of using the randomized nature of a reallife policy implementation to draw conclusions about an important political science question. We will be using data from this paper throughout today’s seminar. For the purposes of this course you can ignore the theoretical section of the paper (though it’s worth reading, as it’s an interesting model), but in short it concludes that we should expect there to be differences in policy outcomes between areas that are and are not governed by a female Pradhans (village chiefs). Instead you should focus on a) the detailed description of the randomization procedure, and b) what the authors did to ensure that randomization of treatment and control conditions was successful.
The Kalla and Brookman (2017) paper is an instructive example of using (several) field experiments that helps to overcome very tricky issues regarding selection bias in the context of a research question that focuses on the role of persuasion in politics. Another classic field experiment in political science is described in Gerber, Green and Larimer (2008) which in addition to being interesting in its own right also forms the basis of this week’s second part of the seminar task/homework. If it caught your interest in the lecture, you can also read the full paper by Banerjee et. al. (2015) which describes the results from 6 important experiments that aim to establish the causal effects of development aid on outcomes for the poor.
Finally, randomized experiments are playing an increasingly important role in policymaking, and it is worth having a look at the Test, Learn, Adapt paper produced by the Behavioural Insights Team and the Cabinet Office, which represents a calltoarms for experimental methods in developing better public policy. In addition to situating the experimental methods we study in a broader policymaking context, this paper has a nice set of examples of successful public policy experiments that have been conducted over the past 20 years.
2.1 Seminar
The main statistical machinery for analysing randomized experiments should be familiar to you all: ttests and linear regression. We will also need a number of other functions today, most of which are listed in the table below.
Function  Purpose 

mean 
Calculate the mean of a vector of numbers 
var 
Calculate the variance of a vector of numbers 
sqrt 
Calculate the squareroot of a number or vector of numbers 
length 
Calculate how many elements there are in a vector of numbers 
t.test 
Conduct a ttest 
lm 
Estimate a linear regression model 
Some of these functions are explained in more detail below. Remember, if you want to know how to use a particular function you can type ?function_name
or help(function_name)
, or you can Google it!
Setting up the Working Directory
You should start your script each week with code similar to the following:
rm(list = ls())
is just telling R to remove everything from your current environment. For instance, if you create an object like we did in last week’s seminar, and then you runrm(list = ls())
, that object will disappear from the environment panel in RStudio and you will no longer be able to access it. We normally put this line at the top of each script we work with so that we are beginning our analysis fresh each time.setwd("path_to_my_PUBL0050_folder")
tells R that you would like to work from (“set”) the folder (or, “working directory”) of your choice. For example, I am keeping the code for this week in myPUBL0050
folder, which is in myTeaching
folder, which is stored in myDropbox
folder. So I would usesetwd("~/Dropbox/Teaching/PUBL0050")
. Set up a subfolder called
data
within yourPUBL0050
folder
As our running example for the seminar, we will use (a simplified version of) the data from Chattopadhyay and Duflo (2004). We will also be using the data from the Gerber, Green and Larimer (2008) study on social pressure and turnout for the homework. Download these datasets from the links at the top of the page, then put them into the data
folder that you just created.
2.1.1 Female politicians and policy outcomes – Chattopadhyay and Duflo (2004)
Chattopadhyay and Duflo ask whether there is a causal effect of having female politicians in government on public policy outcomes. That is, they ask whether women promote different policies than men. Crosssectional comparisons – i.e. comparisons between political authorities with male and female leaders – are unlikely to result in unbiased estimates of the causal effect of interest, because different types of political areas are likely to differ in many ways other than just the gender of the political leader. For example, it is probably the case that more liberal districts will, on average, elect more female politicians, and so any difference in policy outcomes might be attributable to either politicians’ gender, or to district ideology.
To overcome this problem, Chattopadhyay and Duflo rely on the fact that in the mid1990s, onethird of local councils in India (known as Gram Panchayat, or GPs) were randomly assigned to be “reserved” for leadership by female politicians. For each of these councils, the authors selected two villages to measure outcomes about public policy. We will study this data below. Once you have downloaded the data and saved it to your computer, set your working directory to the folder in which that file is stored and then load the women.csv
file into R using the read.csv
function:
As you will see, there are 6 variables in this data.frame:
Variable name  Description 

GP 
Indicator for “Gram Panchayat”, the level of local government studied 
village 
Indicator for villages within GP 
reserved 
Indicator for whether the GP was “reserved” for a female council head 
female 
Indicator for whether the council head was female 
irrigation 
Number of new or repaired irrigation systems in the village since new leader 
water 
Number of new or repaired drinking water systems in the village since new leader 
For the following questions, try writing the relevant code to answer the question without looking at the solutions.
 Check whether or not the reservation policy was effectively implemented by seeing whether those GPs that were reserved did in fact have female politicians elected. Specifically, calculate the proportion of female leaders elected for reserved and unreserved GPs. What do you conclude?
Code Hint: You will need to use the subsetting operators that we used last week.
## Calculate the mean of female for those observations that were "reserved"
mean(women$female[women$reserved == 1])
## [1] 1
## Calculate the mean of female for those observations that were "unreserved"
mean(women$female[women$reserved == 0])
## [1] 0.07476636
## An alternative way to look at this is with prop.table and table
prop.table(table(women$reserved,women$female),1)
##
## 0 1
## 0 0.92523364 0.07476636
## 1 0.00000000 1.00000000
The reservation policy appears to have been followed correctly. All reserved GPs are lead by womem. This contrasts with only 7.5% of unreserved GPs.
 Calculate the estimated average treatment effect of reserved GPs for both
irrigation
andwater
.
## ATE drinkingwater facilities
water_ate < mean(women$water[women$reserved == 1]) 
mean(women$water[women$reserved == 0])
## ATE irrigation facilities
irrigation_ate < mean(women$irrigation[women$reserved == 1]) 
mean(women$irrigation[women$reserved == 0])
water_ate
## [1] 9.252423
irrigation_ate
## [1] 0.3693319
On average, there were 9.3 new water drinking facilities in reserved villages than unreserved villages. By contrast, there were 0.4 fewer irrigation facilities in reserved villages.
 Calculate the standard error of the difference in means for both
irrigation
andwater
. Code Hint: You can calculate the variance of a vector by using thevar
function. Remember also that to subset a vector you can use square parentheses:my_vector[1:10]
. Finally, thelength
function will allow you to calculate how many elements there are in any vector, or any subset of a vector.
Recall that \(\widehat{SE}_\text{ATE} = \sqrt{\frac{\sigma_1^2}{N_1} + \frac{\sigma_0^2}{N_0}}\)
# Calculate the number of observations in the treatment and control groups
n_treat < length(women$water[women$reserved == 1])
n_control < length(women$water[women$reserved == 0])
## Calculate the standard error for the drinkingwater facilities ATE
water_se < sqrt(
(var(women$water[women$reserved == 1])/n_treat) +
(var(women$water[women$reserved == 0])/n_control)
)
## Calculate the standard error for the irrigation facilities ATE
irrigation_se < sqrt(
(var(women$irrigation[women$reserved == 1])/n_treat) +
(var(women$irrigation[women$reserved == 0])/n_control)
)
water_se
## [1] 5.100282
irrigation_se
## [1] 0.9674094
 Using the values you have just computed, calculate the test statistics for the difference in means and conduct a hypothesis test against the null hypothesis that the average treatment effect of a femalelead council is zero (again, for both
irrigation
andwater
). Assume that the sampling distribution of the test statistic under the null hypothesis is well approximated by the standard normal distribution. Conduct your test at the 95% confidence level.
## Calculate the tstatistics
water_t_stat < water_ate/water_se
irrigation_t_stat < irrigation_ate/irrigation_se
water_t_stat
## [1] 1.8141
irrigation_t_stat
## [1] 0.3817742
The teststatistics are both below 1.96 which is the critical value of the standard normal distribution at the 95% confidence level (i.e. when \(\alpha = 0.05\)). We therefore fail to reject the null hypothesis of no effect (though it is pretty close for the water outcome variable!).
 Calculate the confidence intervals for these differences in means.
# Calculate the confidence intervals
water_upper_bound < water_ate + 1.96*water_se
water_lower_bound < water_ate  1.96*water_se
irrigation_upper_bound < irrigation_ate + 1.96*irrigation_se
irrigation_lower_bound < irrigation_ate  1.96*irrigation_se
# Present the results in a data.frame
out < data.frame(outcome = c("Water","Irrigation"),
ate = c(water_ate,irrigation_ate),
upper_ci = c(water_upper_bound,irrigation_upper_bound),
lower_ci = c(water_lower_bound, irrigation_lower_bound))
out
## outcome ate upper_ci lower_ci
## 1 Water 9.2524230 19.248977 0.7441306
## 2 Irrigation 0.3693319 1.526791 2.2654545
 What do the conclusions of these tests suggest about the effects of female leadership on policy outcomes?
The reservation policy had no effect on the number of irrigation systems in villages, as the difference in means is very small. The reservation policy seems to have had a modest positive effect on the number of drinking water facilities. In particular, our best estimate of the average treatment effect suggests that the reservation policy increased the number of drinking water facilities in a GP by about 9 on average. That said, the estimates are sufficiently uncertain that we cannot dismiss the null hypothesis of no effect at the 95% confidence level for either of the outcome variables.
Ttests in R
It is relatively laborious to go through those steps each time you want to conduct a hypothesis test, and so normally we would just use in functions built into R that allow us to do this more easily. The syntax for the main arguments for specifying a Ttest in R is:
t.test(x, y, alt, mu, conf)
Lets have a look at the arguments.
Arguments  Description 

x 
A vector of values from one group of observations 
y 
A vector of values from a different group of observations 
mu 
The value for the difference in means null hypothesis. The default value is 0, but could take on other values if required 
alt 
There are two alternatives to the null hypothesis that the difference in means is zero. The difference could either be smaller or it could be larger than zero. To test against both alternatives, we set alt = "two.sided" . 
conf 
Here, we set the level of confidence that we want in rejecting the null hypothesis. Common confidence intervals are: 95%, 99%, and 99.9%. 
 Using the
t.test
function, check that your answer to question 4 is correct. That is, use thet.test
function to conduct hypothesis tests that the ATE of a femaleled council is zero for both irrigation and drinking water investment.
t.test(x = women$water[women$reserved==1],
y = women$water[women$reserved==0],
mu = 0,
alt = "two.sided",
conf = 0.95)
##
## Welch Two Sample ttest
##
## data: women$water[women$reserved == 1] and women$water[women$reserved == 0]
## t = 1.8141, df = 122.05, pvalue = 0.07212
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.8440572 19.3489031
## sample estimates:
## mean of x mean of y
## 23.99074 14.73832
t.test(x = women$irrigation[women$reserved==1],
y = women$irrigation[women$reserved==0],
mu = 0,
alt = "two.sided",
conf = 0.95)
##
## Welch Two Sample ttest
##
## data: women$irrigation[women$reserved == 1] and women$irrigation[women$reserved == 0]
## t = 0.38177, df = 306.96, pvalue = 0.7029
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.272925 1.534261
## sample estimates:
## mean of x mean of y
## 3.018519 3.387850
The pvalues for the difference in means using the t.test
function confirm the results we calculated manually above. The t.test
pvalue for the water ATE is 0.07, suggesting that we just fail to reject the null at the 95% level. The t.test
pvalue for the irrigation ATE is , which confirms that there is no clear treatment effect for this outcome variable.
Linear regression in R
Another approach to analysing experimental data is to specify a linear regression where we model our two outcome variables (irrigation
,water
) as a function of the treatment variable (reserved
). Recall that in this setup, the estimated coefficient on the treatment variable will be equal to the difference in means we calculated above (and the standard error, confidence intervals, and pvalues will also all follow through as above).
We run linear regressions using the lm()
function in R (lm
stands for Linear Model). The lm()
function needs to know a) the relationship we’re trying to model and b) the dataset for our observations. The two arguments we need to provide to the lm()
function are described below.
Argument  Description 

formula 
The formula describes the relationship between the dependent and independent variables, for example dependent.variable ~ independent.variable 
data 
The name of the dataset that contains the variable of interest. 
For more information on how the lm()
function works, type help(lm) in R.
 Specify linear models for
water
andirrigation
as a function ofreserved
. Assign the output of these models to objects with sensible names. Use thesummary
function on these objects to examine the coefficients, standard errors and pvalues.
# Estimate linear models
water_lm < lm(water ~ reserved, data = women)
irrigation_lm < lm(irrigation ~ reserved, data = women)
# Summarize output
water_lm
##
## Call:
## lm(formula = water ~ reserved, data = women)
##
## Coefficients:
## (Intercept) reserved
## 14.738 9.252
##
## Call:
## lm(formula = irrigation ~ reserved, data = women)
##
## Coefficients:
## (Intercept) reserved
## 3.3879 0.3693
The regression estimate of the difference in means (i.e. \(\beta_\text{reserved}\)) is 9.3 for the drinking water outcome and 0.4 for the irrigation outcome, which are the same as the manually calculated differences.
2.1.2 Reanalysis of Gerber, Green and Larimer (2008)
‘Why do large numbers of people vote, despite the fact that, as Hegel once observed, “the casting of a single vote is of no significance where there is a multitude of electors”?’
This is the question that drives the experimental analysis of Gerber, Green and Larimer (2008). If it is irrational to vote because the costs of doings so (time spent informing oneself, time spent getting to the polling station, etc) are clearly greater than the gains to be made from voting (the probability that any individual voter will be decisive in an election are vanishingly small), then why do we observe millions of people voting in elections? One commonly proposed answer is that voters may have some sense of civic duty which drives them to the polls. Gerber, Green and Larimer investigate this idea empirically by priming voters to think about civic duty while also varying the amount of social pressure voters are subject to.
In a field experiment in advance of the 2006 primary election in Michigan, nearly 350,000 voters were assigned at random to one of four treatment groups, where voters received mailouts which encouraged them to vote, or a control group where voters received no mailout. The treatment and control conditions were as follows:
 Treatment 1 (“Civic duty”): Voters receive mailout reminding them that voting is a civic duty
 Treatment 2 (“Hawthorne”): Voters receive mailout telling them that researchers would be studying their turnout based on public records
 Treatment 3 (“Self”): Voters receive mailout displaying the record of turnout for their household in prior elections.
 Treatment 4 (“Neighbors”): Voters receive mailout displaying the record of turnout for their household and their neighbours’ households in prior elections.
 Control: Voters receive no mailout.
Load the replication data for Gerber, Green and Larimer (2008). This data is stored in a .Rdata
format, which is the main way to save data in R. Therefore you will not be able to use read.csv
but instead should use the function load
.
Once you have loaded the data, familiarise yourself with the the gerber
object which should be in your current envionment. Use the str
and summary
functions to get an idea of what is in the data. There are 5 variables in this data.frame:
Variable name  Description 

voted 
Indicator for whether the voter voted in the 2006 election (1) or did not vote (0) 
treatment 
Factor variable indicating which treatment arm (or control group) the voter was allocated to 
sex 
Sex of the respondent 
yob 
Year of birth of the respondent 
p2004 
Indicator for whether the voter voted in the 2004 election (Yes) or not (No) 
 Calculate the turnout rates for each of the experimental groups (4 treatments, 1 control). Calculate the number of individuals allocated to each group. Recreate table 2 on p. 38 of the paper.
Here is one (somewhat laborious) way of constructing the table:
## Calculate the mean outcome for each condition
y_bar_control < mean(gerber$voted[gerber$treatment == "Control"])
y_bar_civic < mean(gerber$voted[gerber$treatment == "Civic Duty"])
y_bar_hawthorne < mean(gerber$voted[gerber$treatment == "Hawthorne"])
y_bar_self < mean(gerber$voted[gerber$treatment == "Self"])
y_bar_neighbor < mean(gerber$voted[gerber$treatment == "Neighbors"])
## Calculate the total number of observations for each condition
n_control < sum(gerber$treatment == "Control")
n_civic < sum(gerber$treatment == "Civic Duty")
n_hawthorne < sum(gerber$treatment == "Hawthorne")
n_self < sum(gerber$treatment == "Self")
n_neighbor < sum(gerber$treatment == "Neighbors")
## Concatenate into two vectors (using "round" to round the percentages to one decimal place)
percentages < round(c(y_bar_control,y_bar_civic,y_bar_hawthorne,
y_bar_self, y_bar_neighbor)*100,1)
totals < c(n_control, n_civic, n_hawthorne, n_self, n_neighbor)
## Combine into a data.frame object
table_two < data.frame(rbind(percentages, totals))
## Provide the correct names
rownames(table_two) < c("Percentage voting", "N of individuals")
colnames(table_two) < c("Control", "Civic Duty", "Hawthorne", "Self", "Neighbors")
print(table_two)
## Control Civic Duty Hawthorne Self Neighbors
## Percentage voting 29.7 31.5 32.2 34.5 37.8
## N of individuals 191243.0 38218.0 38204.0 38218.0 38201.0
Here is an alternative way that is more efficient, but the code may be less readable and take more work to figure out what is going on:
## Calculate the mean outcome for each condition using the aggregate function
y_bars < aggregate(gerber$voted, list(gerber$treatment),
FUN = function(x) round(mean(x)*100,1))
## Calculate the number of observations for each condition using the table function
ns < table(gerber$treatment)
## Combine into a data.frame object
table_two2 < data.frame(rbind(t(y_bars),ns)[2:3,])
## Provide the correct names
rownames(table_two2) < c("Percentage voting", "N of individuals")
# Uncomment this to see that it creates the same table as above
# print(table_two2)
For those who are motivated, you can use the package kableExtra
to faithfully recreate the table. Below is the code for it. You can find a lot of help for the package online, and especially here.
library(kableExtra)
# add the percentage signs
table_two[1,1:5] < paste0(table_two[1,1:5],"%")
kable(table_two) %>% # You may recognise the pipe symbol from the tidyverse
kable_paper() %>% # There are a couple of different themes to choose from
add_header_above(c(" ", "Experimental Group"=5)) %>%
add_header_above(
c("TABLE 2. Effects of Four Mail Treatments on Voter Turnout in the August 2006 Primary Election"=6),
align="l", bold=T, font_size=20)
TABLE 2. Effects of Four Mail Treatments on Voter Turnout in the August 2006 Primary Election



Experimental Group


Control  Civic Duty  Hawthorne  Self  Neighbors  
Percentage voting  29.7%  31.5%  32.2%  34.5%  37.8% 
N of individuals  191243  38218  38204  38218  38201 
 Conduct a series of ttests between each treatment condition and the control condition. Present the results of the ttests either as confidence intervals for the difference in means, or as a pvalue for the null hypothesis that \(\hat{Y}_c = \hat{Y}_t\).
t.test(x = gerber$voted[gerber$treatment == "Civic Duty"],
y = gerber$voted[gerber$treatment == "Control"])$conf.int
## [1] 0.01281368 0.02298501
## attr(,"conf.level")
## [1] 0.95
t.test(x = gerber$voted[gerber$treatment == "Hawthorne"],
y = gerber$voted[gerber$treatment == "Control"])$conf.int
## [1] 0.02062181 0.03085081
## attr(,"conf.level")
## [1] 0.95
t.test(x = gerber$voted[gerber$treatment == "Self"],
y = gerber$voted[gerber$treatment == "Control"])$conf.int
## [1] 0.04332558 0.05370080
## attr(,"conf.level")
## [1] 0.95
t.test(x = gerber$voted[gerber$treatment == "Neighbors"],
y = gerber$voted[gerber$treatment == "Control"])$conf.int
## [1] 0.07603405 0.08658577
## attr(,"conf.level")
## [1] 0.95
In all cases, the difference between the treatment and control condition is statistically significant at the 95% level.
 Use the following code to create three new variables in the data.frame. First, a variable that is equal to 1 if a respondent is female, and 0 otherwise. Second, a variable that measures the age of each voter in years at the time of the experiment (which was conducted in 2006). Third, a variable that is equal to 1 if the voter voted in the November 2004 Midterm election.
## Female dummy variable
gerber$female < ifelse(gerber$sex == "female", 1, 0)
## Age variable
gerber$age < 2006  gerber$yob
## 2004 variable
gerber$turnout04 < ifelse(gerber$p2004 == "Yes", 1, 0)
Using these variables, conduct balance checks to establish whether there are potentially confounding differences between treatment and control groups.
## Balance
m1 < lm(female ~ treatment, data = gerber)
m2 < lm(age ~ treatment, data = gerber)
m3 < lm(turnout04 ~ treatment, data = gerber)
# Presenting
library(stargazer)
stargazer(m1,m2,m3,
title = "Balance Checks",
type="html",
keep.stat = c("n","adj.rsq"),
dep.var.caption = "",
dep.var.labels = c("Gender","Age","Turnout 2004"),
intercept.bottom = F,
intercept.top = T,
covariate.labels = levels(as.factor(gerber$treatment)),
star.cutoffs = c(.05,.01,.001))
Gender  Age  Turnout 2004  
(1)  (2)  (3)  
Control  0.499^{***}  49.814^{***}  0.400^{***} 
(0.001)  (0.033)  (0.001)  
Civic Duty  0.001  0.155  0.001 
(0.003)  (0.081)  (0.003)  
Hawthorne  0.0001  0.109  0.003 
(0.003)  (0.081)  (0.003)  
Self  0.001  0.021  0.002 
(0.003)  (0.081)  (0.003)  
Neighbors  0.001  0.039  0.006^{*} 
(0.003)  (0.081)  (0.003)  
Observations  344,084  344,084  344,084 
Adjusted R^{2}  0.00001  0.00000  0.00001 
Note:  ^{}p<0.05; ^{}p<0.01; ^{}p<0.001 
Looking at these three pretreatment covariates, there is little evidence of imbalance across the treatment and control groups. There are no significant gender or age differences between the control group and any of the treatment groups. There is some evidence a slightly higher proportion of voters turned out to vote in 2004 in the “Neighbors” treatment condition than in the control group (i.e. \(p < 0.05\)), but the difference is very small: turnout was about a half a percentage point higher in the treatment group than the control group (where turnout was about 40%). Overall, these tables do not indicate any failures of randomization.
 Estimate the average treatment effects of the different treatment arms whilst controlling for the variables you created for the question above. How do these estimates differ from regression estimates of the treatment effects only (i.e. without controlling for other factors)? Why?
# Estimate a baseline model
baseline_model < lm(voted ~ treatment, data = gerber)
# Estimate a model with covariates
covariate_model < lm(voted ~ treatment + female + age + turnout04, data = gerber)
# Table with only our treatment effects
stargazer(baseline_model,covariate_model,
type="html",
dep.var.caption = "",
dep.var.labels = "Turnout",
column.labels = c("Baseline","w/ Covariates"),
keep = c("Constant","treatment"),
keep.stat = c("n","adj.rsq"),
intercept.bottom = F,
intercept.top = T,
covariate.labels = levels(as.factor(gerber$treatment)),
star.cutoffs = c(.05,.01,.001))
Turnout  
Baseline  w/ Covariates  
(1)  (2)  
Control  0.297^{***}  0.044^{***} 
(0.001)  (0.003)  
Civic Duty  0.018^{***}  0.019^{***} 
(0.003)  (0.003)  
Hawthorne  0.026^{***}  0.026^{***} 
(0.003)  (0.003)  
Self  0.049^{***}  0.048^{***} 
(0.003)  (0.003)  
Neighbors  0.081^{***}  0.080^{***} 
(0.003)  (0.003)  
Observations  344,084  344,084 
Adjusted R^{2}  0.003  0.045 
Note:  ^{}p<0.05; ^{}p<0.01; ^{}p<0.001 
As expected from a randomized experiment, controlling for pretreatment covariates has very little consequence for the estimated treatment effects. Because the covariates are balanced in expectation (and in this exact randomization there is also very little imbalance across the treatment arms), estimating the treatment effects conditional on covariates results in very similar estimates as the baseline estimates.
 Estimate the treatment effects separately for men and women. Do you note any differences in the impact of the treatment amongst these subgroups?
There are two ways of estimating these effects separately for men and women. First, you could simply estimate the same model on different subsets of the data:
# Estimate regression models on subsets of data
male_model < lm(voted ~ treatment, data = gerber[gerber$female == 0,])
female_model < lm(voted ~ treatment, data = gerber[gerber$female == 1,])
# Construct a data.frame with the treatment coefficients from each
coef_compare < data.frame(male = coef(male_model)[2:5],
female = coef(female_model)[2:5])
coef_compare
## male female
## treatmentCivic Duty 0.01994637 0.01588446
## treatmentHawthorne 0.02468701 0.02679139
## treatmentSelf 0.04575431 0.05129251
## treatmentNeighbors 0.08174818 0.08089951
The treatment effects are in fact very similar between men and women. The largest difference in effect size is the “Self” treatment condition, but even here the difference is only one half of a percentage point. Both men and women seem equally likely to respond to appeals to civic duty and social pressure when making the decision to turn out to vote.
An alternative approach is to include an interaction between the treatment
variable and the female
variable:
##
## Call:
## lm(formula = voted ~ treatment * female, data = gerber)
##
## Residuals:
## Min 1Q Median 3Q Max
## 0.3845 0.3172 0.2905 0.6583 0.7095
##
## Coefficients:
## Estimate Std. Error t value Pr(>t)
## (Intercept) 0.3027947 0.0014991 201.986 < 2e16 ***
## treatmentCivic Duty 0.0199464 0.0036770 5.425 5.81e08 ***
## treatmentHawthorne 0.0246870 0.0036740 6.719 1.83e11 ***
## treatmentSelf 0.0457543 0.0036752 12.450 < 2e16 ***
## treatmentNeighbors 0.0817482 0.0036774 22.230 < 2e16 ***
## female 0.0123389 0.0021223 5.814 6.11e09 ***
## treatmentCivic Duty:female 0.0040619 0.0052002 0.781 0.435
## treatmentHawthorne:female 0.0021044 0.0052010 0.405 0.686
## treatmentSelf:female 0.0055382 0.0052002 1.065 0.287
## treatmentNeighbors:female 0.0008487 0.0052012 0.163 0.870
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.464 on 344074 degrees of freedom
## Multiple Rsquared: 0.003569, Adjusted Rsquared: 0.003542
## Fstatistic: 136.9 on 9 and 344074 DF, pvalue: < 2.2e16
You can also plot the predicted values depending on treatment and gender with the following code and the package sjPlot
. It uses ggplot2
grammar, so it’s easy to customise it.
library(sjPlot)
library(ggplot2)
library(ggthemes)
library(wesanderson) # to make colours interesting
# Change the female variable so it has meaningful labels
gerber$female < as.factor(gerber$female)
levels(gerber$female) < c("Men","Women")
interaction_model < lm(voted ~ treatment*female, data = gerber)
# Plot the model
plot_model(interaction_model,type="pred",
terms = c("treatment","female"),
colors = c(wes_palette("GrandBudapest2")[1],
wes_palette("GrandBudapest2")[2])) +
labs(y = "Estimated probability of turnout", x = "", title = "") +
theme_bw() +
guides(color = guide_legend(""))
The effect of each treatment condition relative to the control condition for men is simply the coefficient associated with each treatment indicator. So, the effect of the “Civic Duty” treatment on men is that it increased turnout by 0.01995. Note that this is exactly the same as the effect we estimated using the male_model
above.
The effect of each treatment condition for women is calculated by taking the sum of the coefficient associated with each treatment indicator and the coefficient associated with the interaction between that indicator and the female
variable. So, for example, the effect of the “Civic Duty” treatment for women is 0.01995  0.0041 = 0.01585. Again, this is directly equivalent (up to rounding error) to the effect size we calculated for the female_model
above.
The advantage of using the interaction model is that we can directly assess, from a statistical perspective, whether the differences in treatment effects between men and women are significant. We can see from the regression output that they are not: none of the interaction effects is significantly different from zero (the tstatistics are very small, and the pvalues are large), which implies that the treatments are equally effective for people of both genders.
2.2 Quiz
 What does the expression \(E[Y_{1i}] = E[Y_i  D_i = 1]\) mean?
 That the observed outcomes of the treatment group are representative of the population of treated potential outcomes
 That the potential outcomes of the treatment group are representative of the population of treated observed outcomes
 That expectations are linear
 That the expected values of the untreated potential outcomes in treatment and control groups are different
 What does “unbiasedness” of an estimator mean?
 That if I iterated the sampling procedure infinitely, the sampling distribution would converge around the true value
 That if I iterated the sampling procedure infinitely, the mean of the sampling distribution would be the true value
 That if I iterated the sampling procedure infinitely, the variance of the sampling distribution would be small
 That if I used it to estimate a parameter I would guess its true value
 When we refer to the sampling distribution of an estimator in the context of an experiment, what is being iteratively sampled?
 The units selected in the experiment
 The timing of the experiment
 The random assignment of units to treatment, in the sample
 The random assignment of units to treatment, and the sample
 What can we use covariates for, in the context of an experiment?
 To reduce bias of our estimated treatment effect and control for confounders
 To perform balance checks, increase precision of our estimates, and estimate heterogeneous treatment effects
 To perform balance checks and estimate heterogeneous treatment effects
 Covariates have no use in an experiment
 What is the issue of selection bias in causal inference?
 The treatment applied is ideologically biased
 The units in our sample are biased
 That units selected into treatment are fundamentally different than units selected into control and therefore our estimated treatment effect will be biased (i.e. systematically wrong)
 The researcher is biased