6 Regression (Causality)
In the lecture this week, we discussed the use of regression for estimating causal effects. We saw that, when analysing data from randomized experiments, regression is a flexible tool that can help to a) understand treatment effect heterogeneity across different types of units; and b) calculate treatment effects for non-binary independent variables. We then discussed the assumptions required to make causal inferences from the analysis of observational data with regression models, and particularly focused on issues relating to omitted variable bias and “controlling” for confounding factors. Finally, we gave an example of a regression discontinuity design, which is one strategy for making causal inferences from observational data in the presence of potentially unobserved confounding variables.
In seminar this week, we will:
- …use regression as a method for “controlling” for potentially confounding covariates.
- …consider the assumptions required for making causal interpretations of regression coefficients.
- …practice calculating fitted values from an interaction model.
- …implement a regression discontinuity design.
Before coming to the seminar
- Please read chapter 4, “Prediction” in Quantitative Social Science: An Introduction
What is the monetary value of serving as an elected politician? Do politicians benefit financially because of their political offices? Chapter 4 of the textbook (pp. 176 - 181) reports on findings from a paper by Andrew Eggers and Jens Hainmueller (‘MPs for sale? Returns to Office in Postwar British Politics’) which investigates the financial returns to serving in parliament by studying data from the UK. Eggers and Hainmueller compare the wealth at the time of death for individuals who ran for office and won (MPs) to individuals who ran for office and lost (candidates) in order to draw causal inferences about the effects of political office on wealth. The textbook shows results from a regression discontinuity design which uses this data, in which the margin of victory is used as a variable that creates “as if random” variation between election winners and election losers. From that analysis, we can see that there is a large causal effect of holding office on wealth for Tory candidates, but a small causal effect of holding office for Labour candidates.
In this seminar, we will try to replicate the findings from the regression discontinuity design. We will use the same data, but rather than using the quasi-random variation induced by election results to identify the causal effect of office holding, we will instead use regression to control for potentially confounding variables.
The data set is in the csv file
mps.csv, which you should download and store in your
PUBL0055/data folder as in previous weeks. You should then make sure you working directory is set to the appropriate location, and then load the data as follows:
This data includes observations of 425 individuals. There are indicators for the main outcome of interest – the (log) wealth at the time of the individual’s death (
ln.gross) – and for the treatment – whether the individual was elected to parliament (
elected == 1) or failed to win their election (
elected == 0). The data also includes information on a rich set of covariates.
The names and descriptions of variables are:
||Surname of the candidate|
||First name of the candidate|
||Log gross wealth at the time of death|
||Log net wealth at the time of death|
||1 if the candidate was elected, 0 if they were not elected|
||Year of birth|
||Year of death|
||1 if the candidate had an aristocratic title of nobility, 0 otherwise|
||1 if the candidate is a woman, 0 otherwise|
||The region in which the candidate stood for election|
||Margin of the candidate’s party in the previous election|
||Margin of victory (positive when the candidate won the election, negative when they lost the election)|
||The occupation of the candidate before they stood for election|
||The type of secondary school that the candidate attended|
||The type of university that the candidate attended|
||Whether the candidate stood for the Labour (“Labour”) or Conservative (“Tory”) party|
Use a simple linear regression model to evaluate the relationship between gross wealth at death and whether or not a candidate was elected. Interpret the coefficient on the
elected variable. Is the relationship positive or negative? Does this represent a causal difference?
## ## Call: ## lm(formula = ln.gross ~ elected, data = mps) ## ## Coefficients: ## (Intercept) elected ## 12.4185 0.5178
The naive model suggests that politicians who are elected to parliament are 0.52 log points wealthier than politicians who were not elected to parliament. This is clearly not a causal difference, as there are many potentially confounding differences between those who are elected and those who are not elected. That is, it is likely that the difference estimated in this regression is subject to omitted variable bias.
a. Create a box plot with
school on the x-axis and
ln.gross on the y-axis. Is there any association between where a candidate went to school and the amount of money they were worth at the time of their death?
Average wealth at the time of death clearly differs by the school that a politician attended. Those who attended Eton (a British “public” school which has been responsible for educating several UK Prime Ministers, as well as many MPs) are much wealthier on average at the time of their deaths than those from other “public” schools, or than those who attended “regular” schools.
b. What proportion of candidates who attended Eton were elected? What proportion of candidates who attended public school were elected? What proportion of candidates who attened “regular” schools were elected?
## ## 0 1 ## Eton 0.2222222 0.7777778 ## Public 0.6076923 0.3923077 ## Regular 0.6529851 0.3470149
78% of candidates who attended Eton were elected, compared to 39% from other public schools and 35% from regular schools.
c. What do the results the two subquestions above suggest about the relationship between gross wealth at death and whether a candidate was elected?
As we discussed in lecture, when dealing with observational data, we should always be wary about drawing causal conclusions from regression analyses because of the potential for confounding/omitted variable bias. The comparisons above show that the relationship between whether a candidate was elected and gross wealth at death is likely to be confounded by educational background.
In particular, omitted variables (variables not included in our regressions) will bias the regression estimates away from the causal effect of our explanatory variables when the omitted variable is correlated with both our independent variable and our dependent variable. In this example, there is a clear relationship between the type of school an individual attended and their wealth at death (as seen in the boxplot from question 2a). There is also a clear correlation between the type of school attended and whether the candidate won their election (as seen in the results from question 2b). Accordingly, both criteria for omitted variable bias are present here!
This implies that the regression coefficient from the
naive_modelthat we estimated in question 1 does not represent the causal effect of office holding on wealth, as it is subject to omitted variable bias.
d. Estimate a new linear regression model to evaluate the relationship between gross wealth at death and whether or not a candidate was elected, but in this model you should also control for
school. Does the coefficient associated with the
elected variable change when you include the additional control? Why?
## ## ===================================== ## Model 1 Model 2 ## ------------------------------------- ## (Intercept) 12.42 *** 13.33 *** ## (0.06) (0.21) ## elected 0.52 *** 0.41 *** ## (0.10) (0.10) ## schoolPublic -0.75 *** ## (0.22) ## schoolRegular -1.02 *** ## (0.21) ## ------------------------------------- ## R^2 0.06 0.11 ## Adj. R^2 0.05 0.11 ## Num. obs. 425 425 ## ===================================== ## *** p < 0.001; ** p < 0.01; * p < 0.05
schoolleads to a decrease in the
electedcoefficient: it reduces from 0.52 in model 1 to 0.41 in model two. This change occurs because in the second model we are “holding constant” the school that the politician attended while estimating the effect of being elected to office on wealth.
a. Examine the other variables in the data and select an additional three control variables to include in your model. Do not pick these at random, but rather think about whether there is reason to believe that these variables might be a cause of omitted variable bias in the model specification you estimated in question 2.d above.
We have estimated a model which includes three additional predictors –
occupation– that are all plausibly associated with both the dependent variable and main independent variable in the analysis. Recall that omitted variable bias is a concern when we have reason to believe that the variables that are omitted are related both to X and Y.
In this case, it seems reasonable that whether a candidate is an aristocrat, whether they are female, and which occupation they had before running for office will all be correlated with whether or not the candidate wins the election (that is, they are probably correlated with X). Similarly, it seems likely that these factors are also correlated with the wealth of an individual when they die, and so are all potential sources of omitted variable bias.
b. Does the coefficient on the
elected variable change when you estimate your new model?
## ## ============================================================ ## Model 1 Model 2 Model 3 ## ------------------------------------------------------------ ## (Intercept) 12.42 *** 13.33 *** 13.50 *** ## (0.06) (0.21) (0.25) ## elected 0.52 *** 0.41 *** 0.42 *** ## (0.10) (0.10) (0.10) ## schoolPublic -0.75 *** -0.86 *** ## (0.22) (0.22) ## schoolRegular -1.02 *** -1.05 *** ## (0.21) (0.21) ## aristo 0.42 ## (0.29) ## female 0.16 ## (0.24) ## occupationjournalist -0.05 ## (0.21) ## occupationlawyer 0.27 ## (0.21) ## occupationlocal politics -0.13 ## (0.20) ## occupationteacher -0.31 ## (0.17) ## occupationunion -0.54 ## (0.32) ## occupationwhite collar -0.10 ## (0.21) ## ------------------------------------------------------------ ## R^2 0.06 0.11 0.15 ## Adj. R^2 0.05 0.11 0.13 ## Num. obs. 425 425 425 ## ============================================================ ## *** p < 0.001; ** p < 0.01; * p < 0.05
There is a small change in the estimated coefficient associated with the
electedvariable when controlling for these other factors, though it remains very similar to the coefficient that we estimated in
control_model_1. This suggests that the school that a candidate attended is a larger source of omitted variable bias than the additional control variables that we added for this question.
c. Does the coefficient estimate associated with the
elected variable now describe a causal effect? What assumptions are required to give this coefficient a causal interpretation? Can you think of any reasons why these assumptions may not be plausible in this context?
In order to interpret this coefficient as the causal effect of being elected, we have to assume that our regression model controls for all potentially confounding variables. While we have controlled for more variables in this specification than in either of the two previous models, is it fair to say that we have controlled for every possible omitted variable that might be the source of bias? Probably not.
For instance, one potential confounder that we are not controlling for is candidate quality – it may be the case that the people who are elected are just better in many ways that the people who are not elected. Those quality differences may also be correlated with lifetime earnings. As a consquence, our estimate of the effect of being elected would again be biased. This is a particularly difficult problem because it is not clear how we would go about measuring the quality of different candidates, as this is an essentially unobservable quantity! Accordingly, we should probably still be cautious about providing a causal interpretation of these results.
The original paper from which this data is taken shows that the effect of serving in office (i.e. getting elected) on a candidate’s wealth is larger for Tory MPs than Labour MPs. In this question, you will adapt the regression specification that you selected in question 3.c to allow the effect of
elected to vary by party.
a. Estimate a new regression model which includes an interaction term between
b. Construct the fitted values for two Labour candidates: one who was elected, and one who was not. Note that in order to calculate these values, you will have to choose values for all of the independent variables that you included in your model. Once you have calculated these values, exponentiate them using the
exp() function, which will convert them from the log scale into £ values. What is the effect, in pounds, of being elected as an MP for Labour candidates?
# Select the covariate values at which to calculate predictions labour_X_values <- data.frame(elected = c(0,1), party = "Labour", school= "Regular", aristo = 0, female = 0, occupation = "white collar", university = "Degree" ) # Calculate the fitted values labour_fitted_values <- predict(interaction_model, newdata = labour_X_values) # Exponentiate these values to convert them into pound amounts exponentiated_lab_fitted_vals <- exp(labour_fitted_values) # Calculate the difference in fitted values exponentiated_lab_fitted_vals - exponentiated_lab_fitted_vals
## 2 ## 41334.96
For a Labour candidate with these covariate values, we estimate that being elected increases wealth at the time of death by £41335.
c. Repeat the fitted values calculations above for Tory candidates. What is the effect, in pounds, of being elected as an MP for Tory candidates?
tory_X_values <- data.frame(elected = c(0,1), party = "Tory", school= "Regular", aristo = 0, female = 0, occupation = "white collar", university = "Degree" ) tory_fitted_values <- predict(interaction_model, newdata = tory_X_values) exponentiated_tory_fitted_vals <- exp(tory_fitted_values) exponentiated_tory_fitted_vals - exponentiated_tory_fitted_vals
## 2 ## 191564.1
For a Tory candidate, we estimate that being elected increases wealth at the time of death by £191564. This is therefore a much larger effect than for Labour Party candidates.
As shown in the textbook and the original article, Eggers and Hainmueller use a regression discontinuity design to analyse this data. What is the virtue of the RD design? What problem does it aim to solve? Which assumptions are required in order for us to consider the regression discontinuity design estimates to be “causal”?
The RD design provides an alternative approach to making causal inferences from observational data when we cannot control for all potentially confounding variables. Given that we often are unable to observe or measure all confounders, if we can identify some context in which the assignment of our treatment is “as good as random” for some units, then we can use that quasi-randomness to identify the causal effect of our treatment on our outcome. The critical assumption required for the regression discontinuity design is that our treatment really is assigned essentially at random in the region of the threshold. If this is not true, then our estimates will be subject to the same confounding problems that we had in the first place!
In the context of this example, the authors use the “margin of victory” variable to compare wealth outcomes for candidates who narrowly won the election (where
margin was just greater than 0) to candidates who narrowly lost the election (where
margin was just smaller than 0). Why might focussing on the differences between narrowly winning and narrowly losing candidates be a better comparison than just comparing all elected candidates to all candidates that lost?
The intuition behind the design in this case is that candidates who narrowly win elections are unlikely to be very different, on average, from candidates who narrowly lose elections. Of course, if we were to compare all winners and all losers, we might expect there to be large confounding differences between them (as we saw above, they do in fact differ with regard to their educational backgrounds in this case). But the idea here is that whether a candidate gets 0.1% of the vote more than her opponent, or 0.1% less than her opponent, is essentially a random quantity, and therefore comparing these types of individuals will reduce the degree of confounding that affects our comparisons.
Replicate the analysis in the figure on page 179 of the textbook, but here create a single figure that combines both Labour and Conservative candidates in the same analysis. Also, use the
ln.gross variable for the outcome, rather than the
lost_election_fit <- lm(ln.gross ~ margin, data = mps[mps$margin < 0,]) won_election_fit <- lm(ln.gross ~ margin, data = mps[mps$margin > 0,]) lost_range <- c(min(mps$margin), 0) won_range <- c(0, max(mps$margin)) lost_fitted <- predict(lost_election_fit, newdata = data.frame(margin = lost_range)) won_fitted <- predict(won_election_fit, newdata = data.frame(margin = won_range)) plot(mps$margin, mps$ln.gross, xlab = "Margin of victory", ylab = "Log gross wealth at death", main = "RDD for Labour and Tory candidates") lines(lost_range, lost_fitted, lwd = 3, col = "blue") lines(won_range, won_fitted, lwd = 3, col = "blue") abline(v = 0, lty = "dashed")
Calculate the treatment effect, in pounds, implied by these results.