2 Measurement Theory and Error

Topics: Definition of measurement error. What does it mean for measurements to be fair or unfair? Measurements as functions of indicators. Consequences of measurement error for subsequent analyses.

Required reading:

2.1 Seminar

You can also directly load the data file into R from the web with the following command:

region_data <- read.csv(url("https://uclspp.github.io/POLS0013/4_data/week-2-region.csv"))
constituency_data <- read.csv(url("https://uclspp.github.io/POLS0013/4_data/week-2-constituency.csv"))

In this assignment, we are going to use a combination of real data and simulations in order to try to better understand the consequences of measurement error in an explanatory variable. We are also going to create a new measure for the first time, albeit not a very good one.

The core measurement problem of the assignment is trying to measure the results of the 2016 referendum on EU membership in the UK at the level of UK parliamentary constituencies. This is a measurement problem because the UK used different electoral geography for reporting the results of the 2016 referendum than the parliamentary constituencies used in the 2015 and 2017 general elections. The 2016 referendum was reported at the level of local authorities, of which there are 380 in England, Scotland and Wales. That same area includes 632 parliamentary constituencies, each of which sends one MP to Parliament. For a number of applications, it is useful to have a measure of how the voters in a constituency voted in the referendum, but there is no official answer, we have to come up with a measurement strategy.

The best available measures were developed by Hanretty (2017), and are much more accurate than anything we can do easily here because they use details of the geographic/demographic overlap between the different boundaries to aid in imputation. We will treat these as if they are the right answer, even though they are not exactly the right answer.

We are going to use a very simple measurement strategy to develop our own measures. Both local authorities and parliamentary constituencies are nested within the larger geography of (NUTS 1) UK regions, of which there are 11 in England (9), Scotland (1), and Wales (1). Here is the measurement strategy we are going to follow:

  1. Fit a regression model predicting 2016 leave share using 2015 general election vote for the UK Independence Party at the region level. This is a regression with eleven data points and one explanatory variables.1
  2. Construct fitted values from the regression model for all 632 constituencies in England, Scotland and Wales using the 2015 general election vote in that constituency.

This is an example of a measurement strategy that is based on calibrating the relationship between a set of indicators (vote in the 2015 election) and the target of the measurement (vote in the 2016 referendum) and then extrapolating to a new set of units. This provides a way of measuring what we want to measure because whereas the 2016 referendum vote is not available on the geographic boundaries that we are interested in (parliamentary constituencies), the 2015 election vote was available on those geographic boundaries. Because both votes are available at the region level, we can use that level of geography to “translate” between the two. How well this works depends on how strong the relationship is between the indicators and the target of the measurement, and whether that relationship is similar in the units on which we train or calibrate the model (regions) to the relationship in the units on which we apply that model to construct fitted values.

  1. Load the region data file chapter-2-region.csv and plot the variable Leave16 (y) as a function of UKIP15 (x).
  1. Fit a simple linear regression predicting Leave16 using UKIP15. Interpret the coefficient on UKIP15. What do we learn from the \(R^2\) of the model? Add a regression line to your plot from Q1.
  1. Load the constituency level data file chapter-2-constituency.csv and construct fitted values for each constituency using the variable UKIP15 from that file plus the fitted model. You can either do this by manually constructing the fitted values using the estimated coefficients for the intercept and UKIP15 or using a predict() command with the newdata argument set to the constituency dataframe.
  1. Plot the fitted values you constructed in Q3 against Chris Hanretty’s estimates (the variable Leave16_Hanretty from chapter-2-constituency.csv). Add a line to the plot with intercept 0 and slope 1. What does this line correspond to? What do the deviations of the points from the line correspond to?
  1. Why is there a horizontal row of data points at the bottom of the plot? Hint: Compare the value of these points to the coefficients from the regression model we used to form our measure.
  1. What are the mean and standard deviation of our measurement errors (if Chris Hanretty’s estimates are correct)? What is the correlation between the measurement error and the Hanretty estimates?
  1. How much of \(\mu\) is in \(m\)? Calculate and report the correlation, \(R^2\) as well as Kendall’s \(\tau\) between our measure and Hanretty’s estimates.
  1. Another measurement strategy that we could have followed would be to simply assume that all constituencies in each region had the same Leave vote share as the region overall. One way to do this is with the following command (change the data and newdata arguments to match how you have saved your datasets).
predict(lm(Leave16~Region,data=region_data),newdata= constituency_data)

Evaluate whether this Region-based measure is a better or worse measure of Leave vote share in constituencies than the UKIP15-based measure we constructed previously, by comparison to Chris Hanretty’s estimates. To do this, you will need to repeat some of the above analyses that we did for the UKIP15-based measure and then make a judgment call based on what you find.

  1. Now imagine we want to study patterns in the Conservative Party’s gains between the 2015 and 2017 elections. We have a theory that the Conservative Party will have gained more votes in places that supported Leave in the referendum to a greater extent. We want to assess how strong that relationship is. Estimate three regression models for the change in Conservative vote share between 2015 and 2017, one using our UKIP15-based estimates of the Leave vote on constituency boundaries, one using our Region-based estimates, and one using the Hanretty estimates. You will need to construct the variable for the Conservative vote share change from Con15 and Con17 in the constituency data.
  1. Given the material in the lecture, what are some possible explanations for these differences in the coefficients from Q8 (again, maintaining the assumption that the Hanretty estimates are actually correct)?

References

Hanretty, Chris. 2017. “Areal Interpolation and the UK’s Referendum on EU Membership.” Journal of Elections, Public Opinion and Parties 27 (4): 466–83.

  1. As I said, this is not a very good measurement strategy, although it isn’t terrible either. If we had more data points, we might include more explanatory variables (indicators) such as vote for the Conservative Party, etc.↩︎