# 2 Randomized Experiments

## 2.1 Lecture review and reading

The main motivation for using randomized experiments is that when treatments are randomly assigned, both observed and unobserved potentially confounding factors will be balanced across treatment and control conditions in expectation. That is, randomization on average solves the selection bias problem that we outlined last week. The intution behind this result is nicely described in both the Mostly Harmless book (Chapter 2), and also in the Mastering ’Metrics book (see, in particular, Chapter 1). As Angrist and Pischke put it, “Random assignment works not by eliminating individual differences but rather by ensuring that the mix of individuals being compared is the same.” (MM, p. 16)

For a review of statistical inference (sampling distributions, t-tests, standard errors, etc) the Masterin’ Metrics book has a nice appendix on pages 33-46. The Gerber and Green book (chapter 3) is also very useful, and very clear, and particularly good for contrasting randomization inference with classical statistical methods. Randomization inference is also discussed and applied in the Kalla and Brookman (2016) paper that we discussed in the lecture.

Chattopadhyay and Duflo (2004) is an excellent example of using the randomized nature of a real-life policy implementation to draw conclusions about an important political science question. For the purposes of this course you can ignore the theoretical section of the paper (though it’s worth reading, as it’s an interesting model), but in short it concludes that we should expect there to be differences in policy outcomes between areas that are and are not governed by a female Pradhans (village chiefs). Instead you should focus on a) the detailed description of the randomization procedure, and b) what the authors did to ensure that randomization of treatment and control conditions was successful.

Randomized experiments are playing an increasingly important role in policy-making, and it is worth having a look at the Test, Learn, Adapt paper produced by the Behavioural Insights Team and the Cabinet Office, which represents a call-to-arms for experimental methods in developing better public policy. In addition to situating the experimental methods we study in a broader policy-making context, this paper has a nice set of examples of successful public policy experiments that have been conducted over the past 20 years.

## 2.2 Seminar

The main statistical machinery for analysing randomized experiments should be familiar to you all: t-tests and linear regression. The main objective for this session, then, is to learn how to implement these things in R. Fortunately, doing so is very straight forward, as there are standard functions for both. We will also need a number of other functions today, most of which are listed in the table below.

Function | Purpose |
---|---|

`mean` |
Calculate the mean of a vector of numbers |

`var` |
Calculate the variance of a vector of numbers |

`sqrt` |
Calculate the square-root of a number or vector of numbers |

`length` |
Calculate how many elements there are in a vector of numbers |

`pnorm` |
Calculate the cumulative probability of an input value from the CDF of the normal distribution |

`t.test` |
Conduct a t-test |

`lm` |
Estimate a linear regression model |

Some of these functions are explained in more detail below. Remember, if you want to know how to use a particular function you can type `?function_name`

or `help(function_name)`

, or you can Google it!

### 2.2.1 Data

As our running example for the seminar, we will use (a simplified version of) the data from Chattopadhyay and Duflo (2004). We will also be using the data from the Gerber, Green and Larimer (2008) study on social pressure and turnout. Download these datasets from these links, then put them into the folder that you are using for this week.

You should start your script each week with code similar to the following:

```
rm(list = ls())
setwd("path_to_my_folder")
```

`rm(list = ls())`

is just telling R to remove*everything*from your current environment. For instance, if you create an object like we did in last week’s seminar, and then you run`rm(list = ls())`

, that object will disappear from the environment panel in RStudio and you will no longer be able to access it. We normally put this line at the top of each script we work with so that we are beginning our analysis fresh each time.`setwd("path_to_my_folder")`

tells R that you would like to work from (“set”) the folder (or, “working directory”) of your choice. For example, I am keeping the code for this week in my`week2`

folder, which is in my`PUBL0050`

folder, which is in my`Teaching`

folder, which is stored in my`Dropbox`

folder. So I would use`setwd("~/Dropbox/Teaching/PUBL0050/week2")`

. You should make sure that both the code you write each week, and the data for that week are always stored in the same folder.

### 2.2.2 Female politicians and policy outcomes – Chattopadhyay and Duflo (2004)

Chattopadhyay and Duflo ask whether there is a causal effect of having female politicians in government on public policy outcomes. That is, they ask whether women promote different policies than men. Cross-sectional comparisons – i.e. comparisons between political authorities with male and female leaders – are unlikely to result in unbiased estimates of the causal effect of interest, because different types of political areas are likely to differ in many ways other than just the gender of the political leader. For example, it is probably the case that more liberal districts will, on average, elect more female politicians, and so any difference in policy outcomes might be attributable to either politicians’ gender, or to district ideology.

To overcome this problem, Chattopadhyay and Duflo rely on the fact that in the mid-1990s, one-third of local councils in India (known as Gram Panchayat, or GPs) were randomly assigned to be “reserved” for leadership by female politicians. For each of these councils, the authors selected two villages to measure outcomes about public policy. We will study this data below. Once you have downloaded the data and saved it to your computer, set your working directory to the folder in which that file is stored and then load the `women.csv`

file into R using the `read.csv`

function:

`women <- read.csv("women.csv")`

Now explore the data using some of the functions we learnt last week, and that you will have become familiar with during the homework. Try some of the following:

```
str(women)
head(women)
summary(women)
```

As you will see, there are 6 variables in this data.frame:

Variable name | Description |
---|---|

GP | Indicator for “Gram Panchayat”, the level of local government studied |

village | Indicator for villages within GP |

reserved | Indicator for whether the GP was “reserved” for a female council head |

female | Indicator for whether the council head was female |

irrigation | Number of new or repaired irrigation systems in the village since new leader |

water | Number of new or repaired drinking water systems in the village since new leader |

For the following questions, try writing the relevant code to answer the question without looking at the solutions. If you get stuck, or want to check your answers, then click the “Reveal answer” button, which will reveal the relevant code.

**Question 1.** *Check whether or not the reservation policy was effectively implemented by seeing whether those GPs that were reserved did in fact have female politicians elected. Specifically, calculate the proportion of female leaders elected for reserved and unreserved GPs. What do you conclude?*

## Reveal answer

```
## Calculate the mean of female for those observations that were "reserved"
mean(women$female[women$reserved == 1])
## Calculate the mean of female for those observations that were "unreserved"
mean(women$female[women$reserved == 0])
```

```
[1] 1
[1] 0.07476636
```

The reservation policy appears to have been followed correctly. All reserved GPs are lead by womem. This contrasts with only 7% of unreserved GPs.

**Question 2.** *Calculate the estimated average treatment effect for both irrigation and water.*

## Reveal answer

```
## ATE drinking-water facilities
water_ate <- mean(women$water[women$reserved == 1]) - mean(women$water[women$reserved == 0])
## irrigation facilities
irrigation_ate <- mean(women$irrigation[women$reserved == 1]) - mean(women$irrigation[women$reserved == 0])
water_ate
```

`[1] 9.252423`

`irrigation_ate`

`[1] -0.3693319`

**Question 3.** *Calculate the standard error of the difference in means for both irrigation and water* (

**Hint:**You can calcualte the variance of a vector by using the

`var`

function. Remember also that to subset a vector you can use square parentheses: `my_vector[1:10]`

. Finally, the `length`

function will allow you to calculate how many elements there are in any vector, or any subset of a vector.)## Reveal answer

Recall that \(\widehat{SE}_\text{ATE} = \sqrt{\frac{\sigma_1^2}{N_1} + \frac{\sigma_0^2}{N_0}}\)

```
# Calculate the number of observations in the treatment and control groups
n_treat <- length(women$water[women$reserved == 1])
n_control <- length(women$water[women$reserved == 0])
## Calculate the standard error for the drinking-water facilities ATE
water_se <- sqrt(
(var(women$water[women$reserved == 1])/n_treat) +
(var(women$water[women$reserved == 0])/n_control)
)
## Calculate the standard error for the irrigation facilities ATE
irrigation_se <- sqrt(
(var(women$irrigation[women$reserved == 1])/n_treat) +
(var(women$irrigation[women$reserved == 0])/n_control)
)
water_se
```

`[1] 5.100282`

`irrigation_se`

`[1] 0.9674094`

**Question 4.** *Using the values you have just calculated, conduct a hypothesis test against the null hypothesis that the average treatment effect of a female-lead council is zero (again, for both irrigation and water). Assume that the sampling distribution of the test statistic under the null hypothesis is well approximated by the standard normal distribution (i.e. you can use pnorm to work out the relevant p-values).*

## Reveal answer

```
## Calculate the t-statistics
water_t_stat <- water_ate/water_se
irrigation_t_stat <- irrigation_ate/irrigation_se
## Calculate the p-value
water_p_value <- (1-pnorm(water_t_stat))*2
irrigation_p_value <- pnorm(irrigation_t_stat)*2
water_p_value
```

`[1] 0.06966231`

`irrigation_p_value`

`[1] 0.7026289`

**Question 5.** *Calculate the confidence intervals for these differences in means.*

## Reveal answer

```
# Calculate the confidence intervals
water_upper_bound <- water_ate + 1.96*water_se
water_lower_bound <- water_ate - 1.96*water_se
irrigation_upper_bound <- irrigation_ate + 1.96*irrigation_se
irrigation_lower_bound <- irrigation_ate - 1.96*irrigation_se
# Present the results in a data.frame
out <- data.frame(outcome = c("Water","Irrigation"),
ate = c(water_ate,irrigation_ate),
upper_ci = c(water_upper_bound,irrigation_upper_bound),
lower_ci = c(water_lower_bound, irrigation_lower_bound))
out
```

```
outcome ate upper_ci lower_ci
1 Water 9.2524230 19.248977 -0.7441306
2 Irrigation -0.3693319 1.526791 -2.2654545
```

**Question 6.** *What do the conclusions of these tests suggest about the effects of female leadership on policy outcomes?*

## Reveal answer

The reservation policy had no effect on the number of irrigation systems in villages, but seems to have had a positive effect on the number of drinking water facilities. In particular, our best estimate of the average treatment effect suggests that the reservation policy increased the number of drinking water facilities in a GP by about 9 on average. That said, the estimates are sufficiently uncertain that we cannot dismiss the null hypothesis of no effect at the 95% confidence level for either of the outcome variables.

### 2.2.3 T-tests in R

It is relatively laborious to go through those steps each time you want to conduct a hypothesis test, and so normally we would just use in functions built into R that allow us to do this more easily. The syntax for the main arguments for specifying a T-test in R is:

`t.test(x, y, alt, mu, conf)`

Lets have a look at the arguments.

Arguments | Description |
---|---|

`x` |
A vector of values from one group of observations |

`y` |
A vector of values from a different group of observations |

`mu` |
The value for the difference in means null hypothesis. The default value is 0, but could take on other values if required |

`alt` |
There are two alternatives to the null hypothesis that the difference in means is zero. The difference could either be smaller or it could be larger than zero. To test against both alternatives, we set `alt = "two.sided"` . |

`conf` |
Here, we set the level of confidence that we want in rejecting the null hypothesis. Common confidence intervals are: 95%, 99%, and 99.9%. |

**Question 7.** *Using the t.test function, check that your answer to question 4 is correct. That is, use the t.test function to conduct hypothesis tests that the ATE of a female-led council is zero for both irrigation and drinking water investment.*

## Reveal answer

```
water_t_test <- t.test(x = women$water[women$reserved==1],
y = women$water[women$reserved==0],
mu = 0,
alt = "two.sided",
conf = 0.95)
irrigation_t_test <- t.test(x = women$irrigation[women$reserved==1],
y = women$irrigation[women$reserved==0],
mu = 0,
alt = "two.sided",
conf = 0.95)
water_t_test
```

```
Welch Two Sample t-test
data: women$water[women$reserved == 1] and women$water[women$reserved == 0]
t = 1.8141, df = 122.05, p-value = 0.07212
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.8440572 19.3489031
sample estimates:
mean of x mean of y
23.99074 14.73832
```

`irrigation_t_test`

```
Welch Two Sample t-test
data: women$irrigation[women$reserved == 1] and women$irrigation[women$reserved == 0]
t = -0.38177, df = 306.96, p-value = 0.7029
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.272925 1.534261
sample estimates:
mean of x mean of y
3.018519 3.387850
```

The p-values for the difference in means using the

`t.test`

function are very similar to those we calculated manually. The`t.test`

p-value for the water ATE is 0.072119 compared to 0.069662 from the manual calculation. The`t.test`

p-value for the irrigation ATE is 0.702893 compared to 0.702629 from the manual calculation. The small differences here are attributable to the fact that we used the standard normal distribution to calculate the manual values, while`t.test`

uses the t-distribution. (If you are curious, you can replicate the exact p-values by using the`pt()`

function with the appropriate test-statistic values and degrees of freedom.)

### 2.2.4 Linear regression in R

Another approach to analysing experimental data is to specify a linear regression where we model our two outcome variables (`irrigation`

,`water`

) as a function of the treatment variable (`reserved`

). Recall that in this setup, the estimated coefficient on the treatment variable will be equal to the difference in means we calculated above (and the standard error, confidence intervals, and p-values will also all follow through as above).

We run linear regressions using the `lm()`

function in R (`lm`

stands for **L**inear **M**odel). The `lm()`

function needs to know a) the relationship we’re trying to model and b) the dataset for our observations. The two arguments we need to provide to the `lm()`

function are described below.

Argument | Description |
---|---|

`formula` |
The `formula` describes the relationship between the dependent and independent variables, for example `dependent.variable ~ independent.variable` |

`data` |
The name of the dataset that contains the variable of interest. |

For more information on how the `lm()`

function works, type help(lm) in R.

**Question 8.** *Specify linear models for water and irrigation as a function of reserved. Assign the output of these models to objects with sensible names. Use the summary function on these objects to examine the coefficients, standard errors and p-values.*

## Reveal answer

```
# Estimate linear models
water_lm <- lm(water ~ reserved, data = women)
irrigation_lm <- lm(irrigation ~ reserved, data = women)
# Summarize output
summary(water_lm)
```

```
Call:
lm(formula = water ~ reserved, data = women)
Residuals:
Min 1Q Median 3Q Max
-23.991 -14.738 -7.865 2.262 316.009
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.738 2.286 6.446 4.22e-10 ***
reserved 9.252 3.948 2.344 0.0197 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 33.45 on 320 degrees of freedom
Multiple R-squared: 0.01688, Adjusted R-squared: 0.0138
F-statistic: 5.493 on 1 and 320 DF, p-value: 0.0197
```

`summary(irrigation_lm)`

```
Call:
lm(formula = irrigation ~ reserved, data = women)
Residuals:
Min 1Q Median 3Q Max
-3.388 -3.388 -3.019 -1.019 86.612
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.3879 0.6498 5.214 3.33e-07 ***
reserved -0.3693 1.1220 -0.329 0.742
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.506 on 320 degrees of freedom
Multiple R-squared: 0.0003385, Adjusted R-squared: -0.002785
F-statistic: 0.1084 on 1 and 320 DF, p-value: 0.7422
```

The regression estimate of the difference in means (i.e. \(\beta_\text{reserved}\)) is 9.252423 for the drinking water outcome and -0.3693319 for the irrigation outcome, which are the same as the manually calculated differences.

## 2.3 Homework

**Problem 1: Reanalysis of Gerber, Green and Larimer (2008)**

‘Why do large numbers of people vote, despite the fact that, as Hegel once observed, “the casting of a single vote is of no significance where there is a multitude of electors”?’

This is the question that drives the experimental analysis of Gerber, Green and Larimer (2008). If it is irrational to vote because the costs of doings so (time spent informing oneself, time spent getting to the polling station, etc) are clearly greater than the gains to be made from voting (the probability that any individual voter will be decisive in an election are vanishingly small), then why do we observe millions of people voting in elections? One commonly proposed answer is that voters may have some sense of civic duty which drives them to the polls. Gerber, Green and Larimer investigate this idea empirically by priming voters to think about civic duty while also varying the amount of social pressure voters are subject to.

In a field experiment in advance of the 2006 primary election in Michigan, nearly 350,000 voters were assigned at random to one of four treatment groups, where voters received mailouts which encouraged them to vote, or a control group where voters received no mailout. The treatment and control conditions were as follows:

**Treatment 1 (“Civic duty”)**: Voters receive mailout reminding them that voting is a civic duty**Treatment 2 (“Hawthorne”)**: Voters receive mailout telling them that researchers would be studying their turnout based on public records**Treatment 3 (“Self”)**: Voters receive mailout displaying the record of turnout for their household in prior elections.**Treatment 4 (“Neighbors”)**: Voters receive mailout displaying the record of turnout for their household and their neighbours’ households in prior elections.**Control**: Voters receive no mailout.

Load the replication data for Gerber, Green and Larimer (2008). This data is stored in a `.Rdata`

format, which is the main way to save data in R. Therefore you will not be able to use `read.csv`

but instead should use the function `load`

.

`load("gerber_green_larimer.Rdata")`

Once you have loaded the data, familiarise yourself with the the `gerber`

object which should be in your current envionment. Use the `str`

and `summary`

functions to get an idea of what is in the data. There are 5 variables in this data.frame:

Variable name | Description |
---|---|

voted | Indicator for whether the voter voted in the 2006 election (1) or did not vote (0) |

treatment | Factor variable indicating which treatment arm (or control group) the voter was allocated to |

sex | Sex of the respondent |

yob | Year of birth of the respondent |

p2004 | Indicator for whether the voter voted in the 2004 election (Yes) or not (No) |

Calculate the turnout rates for each of the experimental groups (4 treatments, 1 control). Calculate the number of individuals allocated to each group. Recreate table 2 on p. 38 of the paper.

## Solution

Here is one (somewhat laborious) way of constructing the table:

`## Calculate the mean outcome for each condition y_bar_control <- mean(gerber$voted[gerber$treatment == "Control"]) y_bar_civic <- mean(gerber$voted[gerber$treatment == "Civic Duty"]) y_bar_hawthorne <- mean(gerber$voted[gerber$treatment == "Hawthorne"]) y_bar_self <- mean(gerber$voted[gerber$treatment == "Self"]) y_bar_neighbor <- mean(gerber$voted[gerber$treatment == "Neighbors"]) ## Calculate the total number of observations for each condition n_control <- sum(gerber$treatment == "Control") n_civic <- sum(gerber$treatment == "Civic Duty") n_hawthorne <- sum(gerber$treatment == "Hawthorne") n_self <- sum(gerber$treatment == "Self") n_neighbor <- sum(gerber$treatment == "Neighbors") ## Concatenate into two vectors (using "round" to round the percentages to one decimal place) percentages <- round(c(y_bar_control,y_bar_civic,y_bar_hawthorne, y_bar_self, y_bar_neighbor)*100,1) totals <- c(n_control, n_civic, n_hawthorne, n_self, n_neighbor) ## Combine into a data.frame object table_two <- data.frame(rbind(percentages, totals)) ## Provide the correct names rownames(table_two) <- c("Percentage voting", "N of individuals") colnames(table_two) <- c("Control", "Civic Duty", "Hawthorne", "Self", "Neighbors") print(table_two)`

`Control Civic Duty Hawthorne Self Neighbors Percentage voting 29.7 31.5 32.2 34.5 37.8 N of individuals 191243.0 38218.0 38204.0 38218.0 38201.0`

Here is an alternative way that is more efficient, but the code may be less readable and take more work to figure out what is going on:

`## Calculate the mean outcome for each condition using the aggregate function y_bars <- aggregate(gerber$voted, list(gerber$treatment), FUN = function(x) round(mean(x)*100,1)) ## Calculate the number of observations for each condition using the table function ns <- table(gerber$treatment) y_bars`

`Group.1 x 1 Control 29.7 2 Civic Duty 31.5 3 Hawthorne 32.2 4 Self 34.5 5 Neighbors 37.8`

`ns`

`Control Civic Duty Hawthorne Self Neighbors 191243 38218 38204 38218 38201`

One could of course then take these values and create a pretty table with them!

Conduct a series of t-tests between each treatment condition and the control condition. Present the results of the t-tests either as confidence intervals for the difference in means, or as a p-value for the null hypothesis that \(\hat{Y}_c = \hat{Y}_t\).

## Solution

`t.test(x = gerber$voted[gerber$treatment == "Civic Duty"], y = gerber$voted[gerber$treatment == "Control"])$conf.int`

`[1] 0.01281368 0.02298501 attr(,"conf.level") [1] 0.95`

`t.test(x = gerber$voted[gerber$treatment == "Hawthorne"], y = gerber$voted[gerber$treatment == "Control"])$conf.int`

`[1] 0.02062181 0.03085081 attr(,"conf.level") [1] 0.95`

`t.test(x = gerber$voted[gerber$treatment == "Self"], y = gerber$voted[gerber$treatment == "Control"])$conf.int`

`[1] 0.04332558 0.05370080 attr(,"conf.level") [1] 0.95`

`t.test(x = gerber$voted[gerber$treatment == "Neighbors"], y = gerber$voted[gerber$treatment == "Control"])$conf.int`

`[1] 0.07603405 0.08658577 attr(,"conf.level") [1] 0.95`

In all cases, the difference between the treatment and control condition is statistically significant at the 95% level.

Create a variable that is equal to 1 if a respondent is female, and 0 otherwise. Create a second variable that measures the age of each voter in years at the time of the experiment (which was conducted in 2006). Create a third variable that is equal to 1 if the voter voted in the November 2004 Miderm election. Using these variables, conduct balance checks to establish whether there are potentially confounding differences between treatment and control groups. (

**Hint:**you might find the`ifelse`

function useful for creating the two dummy variables. And remember that a person born in 1986 would have been 20 in 2006.)## Solution

`## Female dummy variable gerber$female <- ifelse(gerber$sex == "female", 1, 0) ## Age variable gerber$age <- 2006 - gerber$yob ## 2004 variable gerber$turnout04 <- ifelse(gerber$p2004 == "Yes", 1, 0) ## Balance summary(lm(female ~ treatment, data = gerber))`

`Call: lm(formula = female ~ treatment, data = gerber) Residuals: Min 1Q Median 3Q Max -0.5002 -0.4989 -0.4989 0.5011 0.5011 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4989411 0.0011434 436.385 <2e-16 *** treatmentCivic Duty 0.0012420 0.0028016 0.443 0.658 treatmentHawthorne 0.0000642 0.0028020 0.023 0.982 treatmentSelf 0.0006402 0.0028016 0.229 0.819 treatmentNeighbors 0.0011243 0.0028021 0.401 0.688 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5 on 344079 degrees of freedom Multiple R-squared: 9.655e-07, Adjusted R-squared: -1.066e-05 F-statistic: 0.08305 on 4 and 344079 DF, p-value: 0.9876`

`summary(lm(age ~ treatment, data = gerber))`

`Call: lm(formula = age ~ treatment, data = gerber) Residuals: Min 1Q Median 3Q Max -29.853 -8.814 0.186 9.186 56.295 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 49.81355 0.03304 1507.651 <2e-16 *** treatmentCivic Duty -0.15451 0.08096 -1.909 0.0563 . treatmentHawthorne -0.10875 0.08097 -1.343 0.1792 treatmentSelf -0.02104 0.08096 -0.260 0.7950 treatmentNeighbors 0.03939 0.08097 0.486 0.6267 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 14.45 on 344079 degrees of freedom Multiple R-squared: 1.651e-05, Adjusted R-squared: 4.883e-06 F-statistic: 1.42 on 4 and 344079 DF, p-value: 0.2244`

`summary(lm(turnout04 ~ treatment, data = gerber))`

`Call: lm(formula = turnout04 ~ treatment, data = gerber) Residuals: Min 1Q Median 3Q Max -0.4067 -0.4003 -0.4003 0.5997 0.6006 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4003388 0.0011209 357.147 <2e-16 *** treatmentCivic Duty -0.0008935 0.0027466 -0.325 0.7449 treatmentHawthorne 0.0028912 0.0027471 1.052 0.2926 treatmentSelf 0.0021417 0.0027466 0.780 0.4355 treatmentNeighbors 0.0063259 0.0027471 2.303 0.0213 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4902 on 344079 degrees of freedom Multiple R-squared: 1.922e-05, Adjusted R-squared: 7.597e-06 F-statistic: 1.653 on 4 and 344079 DF, p-value: 0.1578`

Looking at these three pre-treatment covariates, there is little evidence of imbalance across the treatment and control groups. There are no significant gender or age differences between the control group and any of the treatment groups. There is some evidence a slightly higher proportion of voters turned out to vote in 2004 in the “Neighbors” treatment condition than in the control group (i.e. \(p < 0.05\)), but the difference is very small: turnout was about a half a percentage point higher in the treatment group than the control group (where turnout was about 40%). Overall, these tables do not indicate any failures of randomization.

Estimate the average treatment effects of the different treatment arms whilst controlling for the variables you created for the question above. How do these estimates differ from regression estimates of the treatment effects only (i.e. without controlling for other factors)? Why?

## Solution

`# Estimate a baseline model baseline_model <- lm(voted ~ treatment, data = gerber) # Estimate a model with covariates covariate_model <- lm(voted ~ treatment + female + age + turnout04, data = gerber) # Construct a data.frame with the treatment coefficients from each coef_compare <- data.frame(baseline = coef(baseline_model)[2:5], covariate = coef(covariate_model)[2:5]) coef_compare`

`baseline covariate treatmentCivic Duty 0.01789934 0.01865266 treatmentHawthorne 0.02573631 0.02573836 treatmentSelf 0.04851319 0.04828410 treatmentNeighbors 0.08130991 0.08022561`

As expected from a randomized experiment, controlling for pre-treatment covariates has very little consequence for the estimated treatment effects. Because the covariates are balanced in expectation (and in this exact randomization there is also very little imbalance across the treatment arms), estimating the treatment effects conditional on covariates results in very similar estimates as the baseline estimates.

Estimate the treatment effects separately for men and women. Do you note any differences in the impact of the treatment amongst these subgroups?

## Solution

`# Estimate regression models on subsets of data male_model <- lm(voted ~ treatment, data = gerber[gerber$female == 0,]) female_model <- lm(voted ~ treatment, data = gerber[gerber$female == 1,]) # Construct a data.frame with the treatment coefficients from each coef_compare <- data.frame(male = coef(male_model)[2:5], female = coef(female_model)[2:5]) coef_compare`

`male female treatmentCivic Duty 0.01994637 0.01588446 treatmentHawthorne 0.02468701 0.02679139 treatmentSelf 0.04575431 0.05129251 treatmentNeighbors 0.08174818 0.08089951`

The treatment effects are in fact very similar between men and women. The largest difference in effect size is the “Self” treatment condition, but even here the difference is only one half of a percentage point. Both men and women seem equally likely to respond to appeals to civic duty and social pressure when making the decision to turn out to vote.

**Problem 2 (difficult): Randomization inference**

Using the Chattopadhyay and Duflo `women.csv`

data, conduct a test of the *sharp null hypothesis* that the treatment effect on the drinking water outcome variable is zero for all observations. To construct a sampling distribution of the average treatment effect under the sharp null, you will need to simulate a large number of new randomization assignments (try 10000), and calculate and store the estimated average treatment effect for each of these randomizations. With these simulated average treatment effects in hand:

- Report the probability that we would observe a value as large or larger than 9.252 (i.e. the ATE we estimated from the true randomization) if the true effect were zero for all observations.
Plot a histogram of the ATEs under the null with a vertical line indicating the true treatment effect

**Hint:**You can re-randomize the treatment assignment by using the`sample`

function, which takes a vector as its first argument (the thing you would like to randomly sample from), and an integer as its second argument (the number of things you would like to randomly sample). You may also find it useful to use the`replicate`

function, which allows you to repeatedly evaluate an expression or a function and outputs the results as a vector. Finally,`hist`

will create a histogram, and`abline`

will allow you to add lines to a plot.## Solution

`# Define a function for a) resampling from the treatment vector and b) calculating the simulated ATE ate_sharp_null <- function(){ women$reserved_tmp <- sample(x = women$reserved, size = nrow(women), replace = F) ate_tmp <- mean(women$water[women$reserved_tmp==1]) - mean(women$water[women$reserved_tmp==0]) return(ate_tmp) } # Repeat the randomization process 10000 times, storing the ATEs as a vector sampling_dist <- replicate(10000,ate_sharp_null()) # Find the proportion of simulated treatment effects that are greater than the estimated ATE mean(sampling_dist >= water_ate)`

`[1] 0.0121`

`# Plot a histogram of the sampling distribution with a vertical line hist(sampling_dist) abline(v = water_ate)`

**Problem 3: Statistical inference review**

You are told that the difference in mean earnings between two groups of survey respondents is £470. The first group, let’s call them the Tigers, has 250 individuals and the standard deviation of their earnings is £2779. The second group, the Cats, also has 250 individuals and the standard deviation of their earnings is £3068. On average, the Tigers earn more than the Cats.

What is the standard error of the difference in means between Tigers and Cats?

## Solution

Denote the mean earnings of the Tigers as \(\bar{Y_t}\) and of the Cats as \(\bar{Y_c}\). The difference in means is:

\(\bar{Y_t} - \bar{Y_c} = 470\)

Now denote the standard deviation of Tigers’ earnings as \(s_t\), and \(s_c\) for the Cats, and equivalently \(n_t\) and \(n_c\) for the group sample sizes. The standard error for the difference in means:

\(SE(\bar{Y_t} - \bar{Y_c}) = \sqrt{\frac{s_t^2}{n_t} + \frac{s_c^2}{n_c}} = \sqrt{\frac{2779^2}{250} + \frac{3068^2}{250}} = 261.8\)

What is the appropriate t-statistic relevant to this difference?

## Solution

\(t = \frac{\bar{Y_t} - \bar{Y_c}}{SE(\bar{Y_t} - \bar{Y_c})} = \frac{470}{261.8} = 1.80\)

What is the 95% confidence interval on the difference in mean earnings between Tigers and Cats?

## Solution

95% Confidence Interval: \(470 \pm 1.96 \cdot 261.805\)

Upper Bound: \(470 + 513.138 = 983.138\)

Lower Bound: \(470 - 513.138 = -43.138\)