2 Randomized Experiments
This week we will review the logic that underpins a research design that has become a mainstay of political science research: randomised experiments. We will focus on why randomisation is such a powerful force for making causal inferences (spoiler: internal validity), and will discuss the trade-offs implicit in experimental research (spoiler: external validity). In learning how to analyse experimental data, we will review the t-test and also cover regression as a tool for analysing experiments.
The main motivation for using randomized experiments is that when treatments are randomly assigned, both observed and unobserved potentially confounding factors will be balanced across treatment and control conditions in expectation. That is, randomization solves the selection bias problem that we outlined last week. The intuition behind this result is nicely described in both the Mostly Harmless book (Chapter 2), and also in the Mastering ’Metrics book (see, in particular, Chapter 1). As Angrist and Pischke put it, “Random assignment works not by eliminating individual differences but rather by ensuring that the mix of individuals being compared is the same.” (MM, p. 16)
For a review of statistical inference (sampling distributions, t-tests, standard errors, etc) the Masterin’ Metrics book has a nice appendix on pages 33-46. The Gerber and Green book (chapter 3) is also very useful, and very clear, though note it pays a good deal of attention to randomization inference (which we do not cover on this course), rather than classical statistical inference methods.
Chattopadhyay and Duflo (2004) is an excellent example of using the randomized nature of a real-life policy implementation to draw conclusions about an important political science question. We will be using data from this paper throughout today’s seminar. For the purposes of this course you can ignore the theoretical section of the paper (though it’s worth reading, as it’s an interesting model), but in short it concludes that we should expect there to be differences in policy outcomes between areas that are and are not governed by a female Pradhans (village chiefs). Instead you should focus on a) the detailed description of the randomization procedure, and b) what the authors did to ensure that randomization of treatment and control conditions was successful.
The Kalla and Brookman (2017) paper is an instructive example of using (several) field experiments that helps to overcome very tricky issues regarding selection bias in the context of a research question that focuses on the role of persuasion in politics. Another classic field experiment in political science is described in Gerber, Green and Larimer (2008) which in addition to being interesting in its own right also forms the basis of this week’s second part of the seminar task/homework. If it caught your interest in the lecture, you can also read the full paper by Banerjee et. al. (2015) which describes the results from 6 important experiments that aim to establish the causal effects of development aid on outcomes for the poor.
Finally, randomized experiments are playing an increasingly important role in policy-making, and it is worth having a look at the Test, Learn, Adapt paper produced by the Behavioural Insights Team and the Cabinet Office, which represents a call-to-arms for experimental methods in developing better public policy. In addition to situating the experimental methods we study in a broader policy-making context, this paper has a nice set of examples of successful public policy experiments that have been conducted over the past 20 years.
2.1 Seminar
The main statistical machinery for analysing randomized experiments should be familiar to you all: t-tests and linear regression. We will also need a number of other functions today, most of which are listed in the table below.
Function | Purpose |
---|---|
mean |
Calculate the mean of a vector of numbers |
var |
Calculate the variance of a vector of numbers |
sqrt |
Calculate the square-root of a number or vector of numbers |
length |
Calculate how many elements there are in a vector of numbers |
t.test |
Conduct a t-test |
lm |
Estimate a linear regression model |
Some of these functions are explained in more detail below. Remember, if you want to know how to use a particular function you can type ?function_name
or help(function_name)
, or you can Google it!
Setting up the Working Directory
You should start your script each week with code similar to the following:
rm(list = ls())
is just telling R to remove everything from your current environment. For instance, if you create an object like we did in last week’s seminar, and then you runrm(list = ls())
, that object will disappear from the environment panel in RStudio and you will no longer be able to access it. We normally put this line at the top of each script we work with so that we are beginning our analysis fresh each time.setwd("path_to_my_PUBL0050_folder")
tells R that you would like to work from (“set”) the folder (or, “working directory”) of your choice. For example, I am keeping the code for this week in myPUBL0050
folder, which is in myTeaching
folder, which is stored in myDropbox
folder. So I would usesetwd("~/Dropbox/Teaching/PUBL0050")
.- Set up a subfolder called
data
within yourPUBL0050
folder
As our running example for the seminar, we will use (a simplified version of) the data from Chattopadhyay and Duflo (2004). We will also be using the data from the Gerber, Green and Larimer (2008) study on social pressure and turnout for the homework. Download these datasets from the links at the top of the page, then put them into the data
folder that you just created.
2.1.1 Female politicians and policy outcomes – Chattopadhyay and Duflo (2004)
Chattopadhyay and Duflo ask whether there is a causal effect of having female politicians in government on public policy outcomes. That is, they ask whether women promote different policies than men. Cross-sectional comparisons – i.e. comparisons between political authorities with male and female leaders – are unlikely to result in unbiased estimates of the causal effect of interest, because different types of political areas are likely to differ in many ways other than just the gender of the political leader. For example, it is probably the case that more liberal districts will, on average, elect more female politicians, and so any difference in policy outcomes might be attributable to either politicians’ gender, or to district ideology.
To overcome this problem, Chattopadhyay and Duflo rely on the fact that in the mid-1990s, one-third of local councils in India (known as Gram Panchayat, or GPs) were randomly assigned to be “reserved” for leadership by female politicians. For each of these councils, the authors selected two villages to measure outcomes about public policy. We will study this data below. Once you have downloaded the data and saved it to your computer, set your working directory to the folder in which that file is stored and then load the women.csv
file into R using the read.csv
function:
As you will see, there are 6 variables in this data.frame:
Variable name | Description |
---|---|
GP |
Indicator for “Gram Panchayat”, the level of local government studied |
village |
Indicator for villages within GP |
reserved |
Indicator for whether the GP was “reserved” for a female council head |
female |
Indicator for whether the council head was female |
irrigation |
Number of new or repaired irrigation systems in the village since new leader |
water |
Number of new or repaired drinking water systems in the village since new leader |
For the following questions, try writing the relevant code to answer the question without looking at the solutions.
- Check whether or not the reservation policy was effectively implemented by seeing whether those GPs that were reserved did in fact have female politicians elected. Specifically, calculate the proportion of female leaders elected for reserved and unreserved GPs. What do you conclude?
Code Hint: You will need to use the subsetting operators that we used last week.
- Calculate the estimated average treatment effect of reserved GPs for both
irrigation
andwater
.
- Calculate the standard error of the difference in means for both
irrigation
andwater
. Code Hint: You can calculate the variance of a vector by using thevar
function. Remember also that to subset a vector you can use square parentheses:my_vector[1:10]
. Finally, thelength
function will allow you to calculate how many elements there are in any vector, or any subset of a vector.
Recall that \(\widehat{SE}_\text{ATE} = \sqrt{\frac{\sigma_1^2}{N_1} + \frac{\sigma_0^2}{N_0}}\)
- Using the values you have just computed, calculate the test statistics for the difference in means and conduct a hypothesis test against the null hypothesis that the average treatment effect of a female-lead council is zero (again, for both
irrigation
andwater
). Assume that the sampling distribution of the test statistic under the null hypothesis is well approximated by the standard normal distribution. Conduct your test at the 95% confidence level.
- Calculate the confidence intervals for these differences in means.
- What do the conclusions of these tests suggest about the effects of female leadership on policy outcomes?
T-tests in R
It is relatively laborious to go through those steps each time you want to conduct a hypothesis test, and so normally we would just use in functions built into R that allow us to do this more easily. The syntax for the main arguments for specifying a T-test in R is:
t.test(x, y, alt, mu, conf)
Lets have a look at the arguments.
Arguments | Description |
---|---|
x |
A vector of values from one group of observations |
y |
A vector of values from a different group of observations |
mu |
The value for the difference in means null hypothesis. The default value is 0, but could take on other values if required |
alt |
There are two alternatives to the null hypothesis that the difference in means is zero. The difference could either be smaller or it could be larger than zero. To test against both alternatives, we set alt = "two.sided" . |
conf |
Here, we set the level of confidence that we want in rejecting the null hypothesis. Common confidence intervals are: 95%, 99%, and 99.9%. |
- Using the
t.test
function, check that your answer to question 4 is correct. That is, use thet.test
function to conduct hypothesis tests that the ATE of a female-led council is zero for both irrigation and drinking water investment.
Linear regression in R
Another approach to analysing experimental data is to specify a linear regression where we model our two outcome variables (irrigation
,water
) as a function of the treatment variable (reserved
). Recall that in this setup, the estimated coefficient on the treatment variable will be equal to the difference in means we calculated above (and the standard error, confidence intervals, and p-values will also all follow through as above).
We run linear regressions using the lm()
function in R (lm
stands for Linear Model). The lm()
function needs to know a) the relationship we’re trying to model and b) the dataset for our observations. The two arguments we need to provide to the lm()
function are described below.
Argument | Description |
---|---|
formula |
The formula describes the relationship between the dependent and independent variables, for example dependent.variable ~ independent.variable |
data |
The name of the dataset that contains the variable of interest. |
For more information on how the lm()
function works, type help(lm) in R.
- Specify linear models for
water
andirrigation
as a function ofreserved
. Assign the output of these models to objects with sensible names. Use thesummary
function on these objects to examine the coefficients, standard errors and p-values.
2.1.2 Reanalysis of Gerber, Green and Larimer (2008)
‘Why do large numbers of people vote, despite the fact that, as Hegel once observed, “the casting of a single vote is of no significance where there is a multitude of electors”?’
This is the question that drives the experimental analysis of Gerber, Green and Larimer (2008). If it is irrational to vote because the costs of doings so (time spent informing oneself, time spent getting to the polling station, etc) are clearly greater than the gains to be made from voting (the probability that any individual voter will be decisive in an election are vanishingly small), then why do we observe millions of people voting in elections? One commonly proposed answer is that voters may have some sense of civic duty which drives them to the polls. Gerber, Green and Larimer investigate this idea empirically by priming voters to think about civic duty while also varying the amount of social pressure voters are subject to.
In a field experiment in advance of the 2006 primary election in Michigan, nearly 350,000 voters were assigned at random to one of four treatment groups, where voters received mailouts which encouraged them to vote, or a control group where voters received no mailout. The treatment and control conditions were as follows:
- Treatment 1 (“Civic duty”): Voters receive mailout reminding them that voting is a civic duty
- Treatment 2 (“Hawthorne”): Voters receive mailout telling them that researchers would be studying their turnout based on public records
- Treatment 3 (“Self”): Voters receive mailout displaying the record of turnout for their household in prior elections.
- Treatment 4 (“Neighbors”): Voters receive mailout displaying the record of turnout for their household and their neighbours’ households in prior elections.
- Control: Voters receive no mailout.
Load the replication data for Gerber, Green and Larimer (2008). This data is stored in a .Rdata
format, which is the main way to save data in R. Therefore you will not be able to use read.csv
but instead should use the function load
.
Once you have loaded the data, familiarise yourself with the the gerber
object which should be in your current envionment. Use the str
and summary
functions to get an idea of what is in the data. There are 5 variables in this data.frame:
Variable name | Description |
---|---|
voted |
Indicator for whether the voter voted in the 2006 election (1) or did not vote (0) |
treatment |
Factor variable indicating which treatment arm (or control group) the voter was allocated to |
sex |
Sex of the respondent |
yob |
Year of birth of the respondent |
p2004 |
Indicator for whether the voter voted in the 2004 election (Yes) or not (No) |
- Calculate the turnout rates for each of the experimental groups (4 treatments, 1 control). Calculate the number of individuals allocated to each group. Recreate table 2 on p. 38 of the paper.
- Conduct a series of t-tests between each treatment condition and the control condition. Present the results of the t-tests either as confidence intervals for the difference in means, or as a p-value for the null hypothesis that \(\hat{Y}_c = \hat{Y}_t\).
- Use the following code to create three new variables in the data.frame. First, a variable that is equal to 1 if a respondent is female, and 0 otherwise. Second, a variable that measures the age of each voter in years at the time of the experiment (which was conducted in 2006). Third, a variable that is equal to 1 if the voter voted in the November 2004 Midterm election.
## Female dummy variable
gerber$female <- ifelse(gerber$sex == "female", 1, 0)
## Age variable
gerber$age <- 2006 - gerber$yob
## 2004 variable
gerber$turnout04 <- ifelse(gerber$p2004 == "Yes", 1, 0)
Using these variables, conduct balance checks to establish whether there are potentially confounding differences between treatment and control groups.
- Estimate the average treatment effects of the different treatment arms whilst controlling for the variables you created for the question above. How do these estimates differ from regression estimates of the treatment effects only (i.e. without controlling for other factors)? Why?
- Estimate the treatment effects separately for men and women. Do you note any differences in the impact of the treatment amongst these subgroups?