1 Measurement: What, Why and How
Topics: What is measurement? Why is measurement important? Representative versus pragmatic measurement. How are social measurements used and misused?
Required reading:
- Chapters 1 & 2, Pragmatic Social Measurement
Further reading:
- David J Hand. “Measurement : a very short introduction”. Oxford University Press 2016.
- Ethan Bueno de Mesquita, “The Aims of Public Policy Address: The Perils of Quantification”
1.1 Seminar
For this week’s class we are going to look at existing measures of the extent to which different countries in different years (1946-2008) were “democratic”. The data set that we are using was compiled for a project that created a synthetic measure of how democratic countries were by combining the information in all the different measures (Pemstein, Meserve, and Melton 2010). The different measures in the data set are all on different scales and were constructed by different authors according to different coding rules, covering different countries and years. This assignment is mostly aimed at reminding you how to do data analysis in R
.
Remember that .Rdata
files are loaded into R
with the load()
command. You can directly load the data file into R
from the web with the following command:
- Before you look at the data set, consider whether you think of countries being democratic as a binary quantity or not. Is it the case that a country is either democratic or not, with no middle ground? Or is it a continuum, with countries varying widely in how democratic they are? Can a country be somewhat democratic?
- Load the data file “week-1-democracy.Rdata” into R. The first three variables in the data frame “democracy” are:
cowcode
: COW country codes according to the Correlates of War Projectcountry
: country nameyear
: calendar yearIf you look at the top of the data file using
head(democracy)
you will see 12 further variables (in alphabetic order fromarat
tovanhanen
). Each of these corresponds to a different measure of democracy. You might recognise some of the names (egfreedomhouse
andpolity
are relatively well-known and widely used). The coverage of country-years varies by measure.For each of the 12 measures, use the data set to calculate the range of scores used for that measure.
- In addition to the range of scores, we might want to know if the distribution of scores look similar for the different measures. Generate histograms of all the scores for each measure. Do they look generally similar or not?
- The
polity
score has integer values from -10 to 10. Calculate the proportion of country-years that are classified as democratic bypacl
, among country-years with each value of thepolity
score.
- Plot the results of Q4 by
polity
score. Describe the association you see between the two scores.
- Which are the country-years that are classified as a 10 by
polity
and 0 (non-democratic) bypacl
? Take the last of these in the data set, and figure out whether thepolity
or thepacl
score changed in the subsequent year. What happened in that country in that year? Hint: There are many ways to achieve this. And it’s always a good idea to get inspiration from the internet.
- Plot the trajectory of the
polity
scores and thepacl
scores for the country in question across the full set of years in the data set.
- Use the command
cor(democracy[,4:15],use = "pairwise.complete.obs")
to calculate the correlation table for the 12 measures. You may want to wrap that in around(x,2)
command to make it easier to read. Note that theuse =
argument is needed because not all the measures are available in all the country-years, so we just calculate correlations between measures for the country-years where both are available. What does a higher correlation mean in this context? What does a low correlation mean in this context?
- Overall, do these seem like big disagreements between measures or small disagreements between measures? Are you surprised at how much different measures agree or at how much they disagree?