2 Randomized Experiments


2.1 Lecture review and reading

The main motivation for using randomized experiments is that when treatments are randomly assigned, both observed and unobserved potentially confounding factors will be balanced across treatment and control conditions in expectation. That is, randomization on average solves the selection bias problem that we outlined last week. The intution behind this result is nicely described in both the Mostly Harmless book (Chapter 2), and also in the Mastering ’Metrics book (see, in particular, Chapter 1). As Angrist and Pischke put it, “Random assignment works not by eliminating individual differences but rather by ensuring that the mix of individuals being compared is the same.” (MM, p. 16)

For a review of statistical inference (sampling distributions, t-tests, standard errors, etc) the Masterin’ Metrics book has a nice appendix on pages 33-46. The Gerber and Green book (chapter 3) is also very useful, and very clear, and particularly good for contrasting randomization inference with classical statistical methods. Randomization inference is also discussed and applied in the Kalla and Brookman (2016) paper that we discussed in the lecture.

Chattopadhyay and Duflo (2004) is an excellent example of using the randomized nature of a real-life policy implementation to draw conclusions about an important political science question. For the purposes of this course you can ignore the theoretical section of the paper (though it’s worth reading, as it’s an interesting model), but in short it concludes that we should expect there to be differences in policy outcomes between areas that are and are not governed by a female Pradhans (village chiefs). Instead you should focus on a) the detailed description of the randomization procedure, and b) what the authors did to ensure that randomization of treatment and control conditions was successful.

Randomized experiments are playing an increasingly important role in policy-making, and it is worth having a look at the Test, Learn, Adapt paper produced by the Behavioural Insights Team and the Cabinet Office, which represents a call-to-arms for experimental methods in developing better public policy. In addition to situating the experimental methods we study in a broader policy-making context, this paper has a nice set of examples of successful public policy experiments that have been conducted over the past 20 years.


2.2 Seminar

The main statistical machinery for analysing randomized experiments should be familiar to you all: t-tests and linear regression. The main objective for this session, then, is to learn how to implement these things in R. Fortunately, doing so is very straight forward, as there are standard functions for both. We will also need a number of other functions today, most of which are listed in the table below.

Function Purpose
mean Calculate the mean of a vector of numbers
var Calculate the variance of a vector of numbers
sqrt Calculate the square-root of a number or vector of numbers
length Calculate how many elements there are in a vector of numbers
pnorm Calculate the cumulative probability of an input value from the CDF of the normal distribution
t.test Conduct a t-test
lm Estimate a linear regression model

Some of these functions are explained in more detail below. Remember, if you want to know how to use a particular function you can type ?function_name or help(function_name), or you can Google it!

2.2.1 Data

As our running example for the seminar, we will use (a simplified version of) the data from Chattopadhyay and Duflo (2004). We will also be using the data from the Gerber, Green and Larimer (2008) study on social pressure and turnout. Download these datasets from these links, then put them into the folder that you are using for this week.

You should start your script each week with code similar to the following:

rm(list = ls())
setwd("path_to_my_folder")
  • rm(list = ls()) is just telling R to remove everything from your current environment. For instance, if you create an object like we did in last week’s seminar, and then you run rm(list = ls()), that object will disappear from the environment panel in RStudio and you will no longer be able to access it. We normally put this line at the top of each script we work with so that we are beginning our analysis fresh each time.
  • setwd("path_to_my_folder") tells R that you would like to work from (“set”) the folder (or, “working directory”) of your choice. For example, I am keeping the code for this week in my week2 folder, which is in my PUBL0050 folder, which is in my Teaching folder, which is stored in my Dropbox folder. So I would use setwd("~/Dropbox/Teaching/PUBL0050/week2"). You should make sure that both the code you write each week, and the data for that week are always stored in the same folder.

2.2.2 Female politicians and policy outcomes – Chattopadhyay and Duflo (2004)

Chattopadhyay and Duflo ask whether there is a causal effect of having female politicians in government on public policy outcomes. That is, they ask whether women promote different policies than men. Cross-sectional comparisons – i.e. comparisons between political authorities with male and female leaders – are unlikely to result in unbiased estimates of the causal effect of interest, because different types of political areas are likely to differ in many ways other than just the gender of the political leader. For example, it is probably the case that more liberal districts will, on average, elect more female politicians, and so any difference in policy outcomes might be attributable to either politicians’ gender, or to district ideology.

To overcome this problem, Chattopadhyay and Duflo rely on the fact that in the mid-1990s, one-third of local councils in India (known as Gram Panchayat, or GPs) were randomly assigned to be “reserved” for leadership by female politicians. For each of these councils, the authors selected two villages to measure outcomes about public policy. We will study this data below. Once you have downloaded the data and saved it to your computer, set your working directory to the folder in which that file is stored and then load the women.csv file into R using the read.csv function:

women <- read.csv("women.csv")

Now explore the data using some of the functions we learnt last week, and that you will have become familiar with during the homework. Try some of the following:

str(women)
head(women)
summary(women)

As you will see, there are 6 variables in this data.frame:

Variable name Description
GP Indicator for “Gram Panchayat”, the level of local government studied
village Indicator for villages within GP
reserved Indicator for whether the GP was “reserved” for a female council head
female Indicator for whether the council head was female
irrigation Number of new or repaired irrigation systems in the village since new leader
water Number of new or repaired drinking water systems in the village since new leader

For the following questions, try writing the relevant code to answer the question without looking at the solutions. If you get stuck, or want to check your answers, then click the “Reveal answer” button, which will reveal the relevant code.

Question 1. Check whether or not the reservation policy was effectively implemented by seeing whether those GPs that were reserved did in fact have female politicians elected. Specifically, calculate the proportion of female leaders elected for reserved and unreserved GPs. What do you conclude?

Reveal answer

## Calculate the mean of female for those observations that were "reserved"
mean(women$female[women$reserved == 1])

## Calculate the mean of female for those observations that were "unreserved"
mean(women$female[women$reserved == 0])
[1] 1
[1] 0.07476636

The reservation policy appears to have been followed correctly. All reserved GPs are lead by womem. This contrasts with only 7% of unreserved GPs.

Question 2. Calculate the estimated average treatment effect for both irrigation and water.

Reveal answer

## ATE drinking-water facilities
water_ate <- mean(women$water[women$reserved == 1]) - mean(women$water[women$reserved == 0])

## irrigation facilities
irrigation_ate <- mean(women$irrigation[women$reserved == 1]) - mean(women$irrigation[women$reserved == 0])

water_ate
[1] 9.252423
irrigation_ate
[1] -0.3693319

Question 3. Calculate the standard error of the difference in means for both irrigation and water (Hint: You can calcualte the variance of a vector by using the var function. Remember also that to subset a vector you can use square parentheses: my_vector[1:10]. Finally, the length function will allow you to calculate how many elements there are in any vector, or any subset of a vector.)

Reveal answer

Recall that \(\widehat{SE}_\text{ATE} = \sqrt{\frac{\sigma_1^2}{N_1} + \frac{\sigma_0^2}{N_0}}\)

# Calculate the number of observations in the treatment and control groups
n_treat <- length(women$water[women$reserved == 1])
n_control <- length(women$water[women$reserved == 0])

## Calculate the standard error for the drinking-water facilities ATE
water_se <- sqrt(
  (var(women$water[women$reserved == 1])/n_treat) +
    (var(women$water[women$reserved == 0])/n_control)
)

## Calculate the standard error for the  irrigation facilities ATE
irrigation_se <- sqrt(
  (var(women$irrigation[women$reserved == 1])/n_treat) +
    (var(women$irrigation[women$reserved == 0])/n_control)
)

water_se
[1] 5.100282
irrigation_se
[1] 0.9674094

Question 4. Using the values you have just calculated, conduct a hypothesis test against the null hypothesis that the average treatment effect of a female-lead council is zero (again, for both irrigation and water). Assume that the sampling distribution of the test statistic under the null hypothesis is well approximated by the standard normal distribution (i.e. you can use pnorm to work out the relevant p-values).

Reveal answer

## Calculate the t-statistics
water_t_stat <- water_ate/water_se

irrigation_t_stat <- irrigation_ate/irrigation_se

## Calculate the p-value
water_p_value <- (1-pnorm(water_t_stat))*2
irrigation_p_value <- pnorm(irrigation_t_stat)*2

water_p_value
[1] 0.06966231
irrigation_p_value
[1] 0.7026289

Question 5. Calculate the confidence intervals for these differences in means.

Reveal answer

# Calculate the confidence intervals
water_upper_bound <- water_ate + 1.96*water_se
water_lower_bound <- water_ate - 1.96*water_se

irrigation_upper_bound <- irrigation_ate + 1.96*irrigation_se
irrigation_lower_bound <- irrigation_ate - 1.96*irrigation_se

# Present the results in a data.frame
out <- data.frame(outcome = c("Water","Irrigation"),
           ate = c(water_ate,irrigation_ate), 
           upper_ci = c(water_upper_bound,irrigation_upper_bound), 
           lower_ci = c(water_lower_bound, irrigation_lower_bound))

out
     outcome        ate  upper_ci   lower_ci
1      Water  9.2524230 19.248977 -0.7441306
2 Irrigation -0.3693319  1.526791 -2.2654545

Question 6. What do the conclusions of these tests suggest about the effects of female leadership on policy outcomes?

Reveal answer

The reservation policy had no effect on the number of irrigation systems in villages, but seems to have had a positive effect on the number of drinking water facilities. In particular, our best estimate of the average treatment effect suggests that the reservation policy increased the number of drinking water facilities in a GP by about 9 on average. That said, the estimates are sufficiently uncertain that we cannot dismiss the null hypothesis of no effect at the 95% confidence level for either of the outcome variables.

2.2.3 T-tests in R

It is relatively laborious to go through those steps each time you want to conduct a hypothesis test, and so normally we would just use in functions built into R that allow us to do this more easily. The syntax for the main arguments for specifying a T-test in R is:

t.test(x, y, alt, mu, conf)

Lets have a look at the arguments.

Arguments Description
x A vector of values from one group of observations
y A vector of values from a different group of observations
mu The value for the difference in means null hypothesis. The default value is 0, but could take on other values if required
alt There are two alternatives to the null hypothesis that the difference in means is zero. The difference could either be smaller or it could be larger than zero. To test against both alternatives, we set alt = "two.sided".
conf Here, we set the level of confidence that we want in rejecting the null hypothesis. Common confidence intervals are: 95%, 99%, and 99.9%.

Question 7. Using the t.test function, check that your answer to question 4 is correct. That is, use the t.test function to conduct hypothesis tests that the ATE of a female-led council is zero for both irrigation and drinking water investment.

Reveal answer

water_t_test <- t.test(x = women$water[women$reserved==1], 
       y = women$water[women$reserved==0],
       mu = 0,
       alt = "two.sided",
       conf = 0.95)

irrigation_t_test <- t.test(x = women$irrigation[women$reserved==1], 
       y = women$irrigation[women$reserved==0],
       mu = 0,
       alt = "two.sided",
       conf = 0.95)

water_t_test

    Welch Two Sample t-test

data:  women$water[women$reserved == 1] and women$water[women$reserved == 0]
t = 1.8141, df = 122.05, p-value = 0.07212
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.8440572 19.3489031
sample estimates:
mean of x mean of y 
 23.99074  14.73832 
irrigation_t_test

    Welch Two Sample t-test

data:  women$irrigation[women$reserved == 1] and women$irrigation[women$reserved == 0]
t = -0.38177, df = 306.96, p-value = 0.7029
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.272925  1.534261
sample estimates:
mean of x mean of y 
 3.018519  3.387850 

The p-values for the difference in means using the t.test function are very similar to those we calculated manually. The t.test p-value for the water ATE is 0.072119 compared to 0.069662 from the manual calculation. The t.test p-value for the irrigation ATE is 0.702893 compared to 0.702629 from the manual calculation. The small differences here are attributable to the fact that we used the standard normal distribution to calculate the manual values, while t.test uses the t-distribution. (If you are curious, you can replicate the exact p-values by using the pt() function with the appropriate test-statistic values and degrees of freedom.)

2.2.4 Linear regression in R

Another approach to analysing experimental data is to specify a linear regression where we model our two outcome variables (irrigation,water) as a function of the treatment variable (reserved). Recall that in this setup, the estimated coefficient on the treatment variable will be equal to the difference in means we calculated above (and the standard error, confidence intervals, and p-values will also all follow through as above).

We run linear regressions using the lm() function in R (lm stands for Linear Model). The lm() function needs to know a) the relationship we’re trying to model and b) the dataset for our observations. The two arguments we need to provide to the lm() function are described below.

Argument Description
formula The formula describes the relationship between the dependent and independent variables, for example dependent.variable ~ independent.variable
data The name of the dataset that contains the variable of interest.

For more information on how the lm() function works, type help(lm) in R.

Question 8. Specify linear models for water and irrigation as a function of reserved. Assign the output of these models to objects with sensible names. Use the summary function on these objects to examine the coefficients, standard errors and p-values.

Reveal answer

# Estimate linear models
water_lm <- lm(water ~ reserved, data = women)
irrigation_lm <- lm(irrigation ~ reserved, data = women)

# Summarize output
summary(water_lm)

Call:
lm(formula = water ~ reserved, data = women)

Residuals:
    Min      1Q  Median      3Q     Max 
-23.991 -14.738  -7.865   2.262 316.009 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   14.738      2.286   6.446 4.22e-10 ***
reserved       9.252      3.948   2.344   0.0197 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 33.45 on 320 degrees of freedom
Multiple R-squared:  0.01688,   Adjusted R-squared:  0.0138 
F-statistic: 5.493 on 1 and 320 DF,  p-value: 0.0197
summary(irrigation_lm)

Call:
lm(formula = irrigation ~ reserved, data = women)

Residuals:
   Min     1Q Median     3Q    Max 
-3.388 -3.388 -3.019 -1.019 86.612 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.3879     0.6498   5.214 3.33e-07 ***
reserved     -0.3693     1.1220  -0.329    0.742    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.506 on 320 degrees of freedom
Multiple R-squared:  0.0003385, Adjusted R-squared:  -0.002785 
F-statistic: 0.1084 on 1 and 320 DF,  p-value: 0.7422

The regression estimate of the difference in means (i.e. \(\beta_\text{reserved}\)) is 9.252423 for the drinking water outcome and -0.3693319 for the irrigation outcome, which are the same as the manually calculated differences.

2.3 Homework

Problem 1: Reanalysis of Gerber, Green and Larimer (2008)

‘Why do large numbers of people vote, despite the fact that, as Hegel once observed, “the casting of a single vote is of no significance where there is a multitude of electors”?’

This is the question that drives the experimental analysis of Gerber, Green and Larimer (2008). If it is irrational to vote because the costs of doings so (time spent informing oneself, time spent getting to the polling station, etc) are clearly greater than the gains to be made from voting (the probability that any individual voter will be decisive in an election are vanishingly small), then why do we observe millions of people voting in elections? One commonly proposed answer is that voters may have some sense of civic duty which drives them to the polls. Gerber, Green and Larimer investigate this idea empirically by priming voters to think about civic duty while also varying the amount of social pressure voters are subject to.

In a field experiment in advance of the 2006 primary election in Michigan, nearly 350,000 voters were assigned at random to one of four treatment groups, where voters received mailouts which encouraged them to vote, or a control group where voters received no mailout. The treatment and control conditions were as follows:

  • Treatment 1 (“Civic duty”): Voters receive mailout reminding them that voting is a civic duty
  • Treatment 2 (“Hawthorne”): Voters receive mailout telling them that researchers would be studying their turnout based on public records
  • Treatment 3 (“Self”): Voters receive mailout displaying the record of turnout for their household in prior elections.
  • Treatment 4 (“Neighbors”): Voters receive mailout displaying the record of turnout for their household and their neighbours’ households in prior elections.
  • Control: Voters receive no mailout.

Load the replication data for Gerber, Green and Larimer (2008). This data is stored in a .Rdata format, which is the main way to save data in R. Therefore you will not be able to use read.csv but instead should use the function load.

load("gerber_green_larimer.Rdata")

Once you have loaded the data, familiarise yourself with the the gerber object which should be in your current envionment. Use the str and summary functions to get an idea of what is in the data. There are 5 variables in this data.frame:

Variable name Description
voted Indicator for whether the voter voted in the 2006 election (1) or did not vote (0)
treatment Factor variable indicating which treatment arm (or control group) the voter was allocated to
sex Sex of the respondent
yob Year of birth of the respondent
p2004 Indicator for whether the voter voted in the 2004 election (Yes) or not (No)
  1. Calculate the turnout rates for each of the experimental groups (4 treatments, 1 control). Calculate the number of individuals allocated to each group. Recreate table 2 on p. 38 of the paper.

    Solution

    Here is one (somewhat laborious) way of constructing the table:

    ## Calculate the mean outcome for each condition
    y_bar_control <- mean(gerber$voted[gerber$treatment == "Control"])
    y_bar_civic <- mean(gerber$voted[gerber$treatment == "Civic Duty"])
    y_bar_hawthorne <- mean(gerber$voted[gerber$treatment == "Hawthorne"])
    y_bar_self <- mean(gerber$voted[gerber$treatment == "Self"])
    y_bar_neighbor <- mean(gerber$voted[gerber$treatment == "Neighbors"])
    
    ## Calculate the total number of observations for each condition
    n_control <- sum(gerber$treatment == "Control")
    n_civic <- sum(gerber$treatment == "Civic Duty")
    n_hawthorne <- sum(gerber$treatment == "Hawthorne")
    n_self <- sum(gerber$treatment == "Self")
    n_neighbor <- sum(gerber$treatment == "Neighbors")
    
    ## Concatenate into two vectors (using "round" to round the percentages to one decimal place)
    percentages <- round(c(y_bar_control,y_bar_civic,y_bar_hawthorne, y_bar_self, y_bar_neighbor)*100,1)
    
    totals <- c(n_control, n_civic, n_hawthorne, n_self, n_neighbor)
    
    ## Combine into a data.frame object
    table_two <- data.frame(rbind(percentages, totals))
    
    ## Provide the correct names
    rownames(table_two) <- c("Percentage voting", "N of individuals")
    colnames(table_two) <- c("Control", "Civic Duty", "Hawthorne", "Self", "Neighbors")
    
    print(table_two)
                       Control Civic Duty Hawthorne    Self Neighbors
    Percentage voting     29.7       31.5      32.2    34.5      37.8
    N of individuals  191243.0    38218.0   38204.0 38218.0   38201.0

    Here is an alternative way that is more efficient, but the code may be less readable and take more work to figure out what is going on:

    ## Calculate the mean outcome for each condition using the aggregate function
    y_bars <- aggregate(gerber$voted, list(gerber$treatment), FUN = function(x) round(mean(x)*100,1))
    
    ## Calculate the number of observations for each condition using the table function
    ns <- table(gerber$treatment)
    
    y_bars
         Group.1    x
    1    Control 29.7
    2 Civic Duty 31.5
    3  Hawthorne 32.2
    4       Self 34.5
    5  Neighbors 37.8
    ns
    
       Control Civic Duty  Hawthorne       Self  Neighbors 
        191243      38218      38204      38218      38201 

    One could of course then take these values and create a pretty table with them!

  2. Conduct a series of t-tests between each treatment condition and the control condition. Present the results of the t-tests either as confidence intervals for the difference in means, or as a p-value for the null hypothesis that \(\hat{Y}_c = \hat{Y}_t\).

    Solution

    t.test(x = gerber$voted[gerber$treatment == "Civic Duty"], 
           y = gerber$voted[gerber$treatment == "Control"])$conf.int
    [1] 0.01281368 0.02298501
    attr(,"conf.level")
    [1] 0.95
    t.test(x = gerber$voted[gerber$treatment == "Hawthorne"], 
           y = gerber$voted[gerber$treatment == "Control"])$conf.int
    [1] 0.02062181 0.03085081
    attr(,"conf.level")
    [1] 0.95
    t.test(x = gerber$voted[gerber$treatment == "Self"], 
           y = gerber$voted[gerber$treatment == "Control"])$conf.int
    [1] 0.04332558 0.05370080
    attr(,"conf.level")
    [1] 0.95
    t.test(x = gerber$voted[gerber$treatment == "Neighbors"], 
           y = gerber$voted[gerber$treatment == "Control"])$conf.int
    [1] 0.07603405 0.08658577
    attr(,"conf.level")
    [1] 0.95

    In all cases, the difference between the treatment and control condition is statistically significant at the 95% level.

  3. Create a variable that is equal to 1 if a respondent is female, and 0 otherwise. Create a second variable that measures the age of each voter in years at the time of the experiment (which was conducted in 2006). Create a third variable that is equal to 1 if the voter voted in the November 2004 Miderm election. Using these variables, conduct balance checks to establish whether there are potentially confounding differences between treatment and control groups. (Hint: you might find the ifelse function useful for creating the two dummy variables. And remember that a person born in 1986 would have been 20 in 2006.)

    Solution

    ## Female dummy variable
    gerber$female <- ifelse(gerber$sex == "female", 1, 0)
    
    ## Age variable
    gerber$age <- 2006 - gerber$yob
    
    ## 2004 variable
    gerber$turnout04 <- ifelse(gerber$p2004 == "Yes", 1, 0)
    
    ## Balance
    summary(lm(female ~ treatment, data = gerber))
    
    Call:
    lm(formula = female ~ treatment, data = gerber)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -0.5002 -0.4989 -0.4989  0.5011  0.5011 
    
    Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
    (Intercept)         0.4989411  0.0011434 436.385   <2e-16 ***
    treatmentCivic Duty 0.0012420  0.0028016   0.443    0.658    
    treatmentHawthorne  0.0000642  0.0028020   0.023    0.982    
    treatmentSelf       0.0006402  0.0028016   0.229    0.819    
    treatmentNeighbors  0.0011243  0.0028021   0.401    0.688    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.5 on 344079 degrees of freedom
    Multiple R-squared:  9.655e-07, Adjusted R-squared:  -1.066e-05 
    F-statistic: 0.08305 on 4 and 344079 DF,  p-value: 0.9876
    summary(lm(age ~ treatment, data = gerber))
    
    Call:
    lm(formula = age ~ treatment, data = gerber)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -29.853  -8.814   0.186   9.186  56.295 
    
    Coefficients:
                        Estimate Std. Error  t value Pr(>|t|)    
    (Intercept)         49.81355    0.03304 1507.651   <2e-16 ***
    treatmentCivic Duty -0.15451    0.08096   -1.909   0.0563 .  
    treatmentHawthorne  -0.10875    0.08097   -1.343   0.1792    
    treatmentSelf       -0.02104    0.08096   -0.260   0.7950    
    treatmentNeighbors   0.03939    0.08097    0.486   0.6267    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 14.45 on 344079 degrees of freedom
    Multiple R-squared:  1.651e-05, Adjusted R-squared:  4.883e-06 
    F-statistic:  1.42 on 4 and 344079 DF,  p-value: 0.2244
    summary(lm(turnout04 ~ treatment, data = gerber))
    
    Call:
    lm(formula = turnout04 ~ treatment, data = gerber)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -0.4067 -0.4003 -0.4003  0.5997  0.6006 
    
    Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
    (Intercept)          0.4003388  0.0011209 357.147   <2e-16 ***
    treatmentCivic Duty -0.0008935  0.0027466  -0.325   0.7449    
    treatmentHawthorne   0.0028912  0.0027471   1.052   0.2926    
    treatmentSelf        0.0021417  0.0027466   0.780   0.4355    
    treatmentNeighbors   0.0063259  0.0027471   2.303   0.0213 *  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.4902 on 344079 degrees of freedom
    Multiple R-squared:  1.922e-05, Adjusted R-squared:  7.597e-06 
    F-statistic: 1.653 on 4 and 344079 DF,  p-value: 0.1578

    Looking at these three pre-treatment covariates, there is little evidence of imbalance across the treatment and control groups. There are no significant gender or age differences between the control group and any of the treatment groups. There is some evidence a slightly higher proportion of voters turned out to vote in 2004 in the “Neighbors” treatment condition than in the control group (i.e. \(p < 0.05\)), but the difference is very small: turnout was about a half a percentage point higher in the treatment group than the control group (where turnout was about 40%). Overall, these tables do not indicate any failures of randomization.

  4. Estimate the average treatment effects of the different treatment arms whilst controlling for the variables you created for the question above. How do these estimates differ from regression estimates of the treatment effects only (i.e. without controlling for other factors)? Why?

    Solution

    # Estimate a baseline model
    baseline_model <- lm(voted ~ treatment, data = gerber)
    
    # Estimate a model with covariates
    covariate_model <- lm(voted ~ treatment + female + age + turnout04, data = gerber)
    
    # Construct a data.frame with the treatment coefficients from each
    
     coef_compare <- data.frame(baseline = coef(baseline_model)[2:5], covariate = coef(covariate_model)[2:5])
    
     coef_compare
                          baseline  covariate
    treatmentCivic Duty 0.01789934 0.01865266
    treatmentHawthorne  0.02573631 0.02573836
    treatmentSelf       0.04851319 0.04828410
    treatmentNeighbors  0.08130991 0.08022561

    As expected from a randomized experiment, controlling for pre-treatment covariates has very little consequence for the estimated treatment effects. Because the covariates are balanced in expectation (and in this exact randomization there is also very little imbalance across the treatment arms), estimating the treatment effects conditional on covariates results in very similar estimates as the baseline estimates.

  5. Estimate the treatment effects separately for men and women. Do you note any differences in the impact of the treatment amongst these subgroups?

    Solution

    # Estimate regression models on subsets of data
    
    male_model <- lm(voted ~ treatment, data = gerber[gerber$female == 0,])
    female_model <- lm(voted ~ treatment, data = gerber[gerber$female == 1,])
    
    # Construct a data.frame with the treatment coefficients from each
    
     coef_compare <- data.frame(male = coef(male_model)[2:5], 
                                female = coef(female_model)[2:5])
    
     coef_compare
                              male     female
    treatmentCivic Duty 0.01994637 0.01588446
    treatmentHawthorne  0.02468701 0.02679139
    treatmentSelf       0.04575431 0.05129251
    treatmentNeighbors  0.08174818 0.08089951

    The treatment effects are in fact very similar between men and women. The largest difference in effect size is the “Self” treatment condition, but even here the difference is only one half of a percentage point. Both men and women seem equally likely to respond to appeals to civic duty and social pressure when making the decision to turn out to vote.

Problem 2 (difficult): Randomization inference

Using the Chattopadhyay and Duflo women.csv data, conduct a test of the sharp null hypothesis that the treatment effect on the drinking water outcome variable is zero for all observations. To construct a sampling distribution of the average treatment effect under the sharp null, you will need to simulate a large number of new randomization assignments (try 10000), and calculate and store the estimated average treatment effect for each of these randomizations. With these simulated average treatment effects in hand:

  1. Report the probability that we would observe a value as large or larger than 9.252 (i.e. the ATE we estimated from the true randomization) if the true effect were zero for all observations.
  2. Plot a histogram of the ATEs under the null with a vertical line indicating the true treatment effect

    Hint: You can re-randomize the treatment assignment by using the sample function, which takes a vector as its first argument (the thing you would like to randomly sample from), and an integer as its second argument (the number of things you would like to randomly sample). You may also find it useful to use the replicate function, which allows you to repeatedly evaluate an expression or a function and outputs the results as a vector. Finally, hist will create a histogram, and abline will allow you to add lines to a plot.

    Solution

    # Define a function for a) resampling from the treatment vector and b) calculating the simulated ATE
    ate_sharp_null <- function(){
    
      women$reserved_tmp <- sample(x = women$reserved, size = nrow(women), replace = F)
      ate_tmp <- mean(women$water[women$reserved_tmp==1]) - mean(women$water[women$reserved_tmp==0])  
      return(ate_tmp)
    
    }
    
    # Repeat the randomization process 10000 times, storing the ATEs as a vector
    sampling_dist <- replicate(10000,ate_sharp_null())
    
    # Find the proportion of simulated treatment effects that are greater than the estimated ATE
    mean(sampling_dist >= water_ate)
    [1] 0.0121
    # Plot a histogram of the sampling distribution with a vertical line
    hist(sampling_dist)
    abline(v = water_ate)

Problem 3: Statistical inference review

You are told that the difference in mean earnings between two groups of survey respondents is £470. The first group, let’s call them the Tigers, has 250 individuals and the standard deviation of their earnings is £2779. The second group, the Cats, also has 250 individuals and the standard deviation of their earnings is £3068. On average, the Tigers earn more than the Cats.

  1. What is the standard error of the difference in means between Tigers and Cats?

    Solution

    Denote the mean earnings of the Tigers as \(\bar{Y_t}\) and of the Cats as \(\bar{Y_c}\). The difference in means is:

    \(\bar{Y_t} - \bar{Y_c} = 470\)

    Now denote the standard deviation of Tigers’ earnings as \(s_t\), and \(s_c\) for the Cats, and equivalently \(n_t\) and \(n_c\) for the group sample sizes. The standard error for the difference in means:

    \(SE(\bar{Y_t} - \bar{Y_c}) = \sqrt{\frac{s_t^2}{n_t} + \frac{s_c^2}{n_c}} = \sqrt{\frac{2779^2}{250} + \frac{3068^2}{250}} = 261.8\)

  2. What is the appropriate t-statistic relevant to this difference?

    Solution

    \(t = \frac{\bar{Y_t} - \bar{Y_c}}{SE(\bar{Y_t} - \bar{Y_c})} = \frac{470}{261.8} = 1.80\)

  3. What is the 95% confidence interval on the difference in mean earnings between Tigers and Cats?

    Solution

    95% Confidence Interval: \(470 \pm 1.96 \cdot 261.805\)

    Upper Bound: \(470 + 513.138 = 983.138\)

    Lower Bound: \(470 - 513.138 = -43.138\)