9 Unsupervised Scale Measurement II: Categorical Indicators
Topics: Item response theory for data with binary and/or ordinal indicators.
Required reading:
- Chapter 12, Pragmatic Social Measurement
Further reading:
Theory
- Bartholomew et al. (2008), Ch 8 & 9
Applications
- Shawn Treier and Simon Jackman, “Democracy as a Latent Variable” American Journal of Political Science. Volume 52, Issue 1. January 2008. Pages 201-217
- Shawn Treier, D. Sunshine Hillygus, “The Nature of Political Ideology in the Contemporary Electorate”, Public Opinion Quarterly, Volume 73, Issue 4, Winter 2009, Pages 679–703
- Treier, S. (2010). Where Does the President Stand? Measuring Presidential Ideology. Political Analysis, 18(1), 124-136.
–>
9.1 Seminar
The data for this assignment are an extract from the 2017 British Election Study, conducted using face to face interviews.We are going to consider strategies for analysing these data, both based on tallying up correct and incorrect answers, and also based on using IRT models.
There are a number of questions we might use on this survey that are suitable for analysis with IRT, and the data extract includes a couple of extra sets that you can use if you are interested. The one that we will focus on is the “battery” of questions on political knowledge. These are six items, on which respondents could give True, False, or “Don’t Know” responses:
- (
x01_1
) Polling stations close at 10.00pm on election day. - (
x01_2
) No-one may stand for parliament unless they pay a deposit. - (
x01_3
) Only taxpayers are allowed to vote in a general election. - (
x01_4
) The Liberal Democrats favour a system of proportional representation for Westminster elections. - (
x01_5
) MPs from different parties are on parliamentary committees. - (
x01_6
) The number of members of parliament is about 100.
The correct answers to these six items are True, True, False, True, True, and False, respectively.
Use the following code to tidy up these six items, and re-order the factor levels to run c("False","Don't Know","True")
. Note that these are not R logical values TRUE
and FALSE
, these are the character strings of the responses given to the survey.
knowledge_battery <- bes[,c("x01_1","x01_2","x01_3","x01_4","x01_5","x01_6")]
for (k in 1:ncol(knowledge_battery)) {
knowledge_battery[,k] <- replace(knowledge_battery[,k],knowledge_battery[,k] == "Not stated","Don't know") # there are few of these, but easier to treat them as don't know
knowledge_battery[,k] <- droplevels(knowledge_battery[,k])
knowledge_battery[,k] <- relevel(knowledge_battery[,k],"False")
}
You can quickly see the distribution of responses using the following command:
library(kableExtra)
kable(sapply(knowledge_battery,table), booktabs=T) %>% kable_styling(full_width = F)
x01_1 | x01_2 | x01_3 | x01_4 | x01_5 | x01_6 | |
---|---|---|---|---|---|---|
False | 91 | 489 | 1932 | 175 | 120 | 1528 |
Don’t know | 184 | 596 | 115 | 931 | 690 | 457 |
True | 1919 | 1109 | 147 | 1088 | 1384 | 209 |
For this assignment, you will also need to install and load the R library ltm
:
- Use the object
knowledge_battery
to create a new objectbinary_correct
which is 1 for responses that are correct, and 0 for responses that are don’t know or which are incorrect. The object you create should have the same dimensions and column/row order asknowledge_battery
. Create a histogram of the number of total correct answers by respondent.
- Use the object
knowledge_battery
to create a new objectternary_correct
which is 1 for responses that are correct, 0 for responses that are don’t know, and -1 for responses which are incorrect. The object you create should have the same dimensions and column/row order asknowledge_battery
. Create a histogram of the total number of “points” for each respondent, where each incorrect response counts as -1, each don’t know response counts as 0, and each correct response counts as 1.
- What potential problem with counting correct answers is addressed by counting incorrect responses as -1 rather than 0 if the goal is to create a measure of political knowledge for respondents to the survey?
- Use the R function
ltm()
to fit a binary IRT model usingbinary_correct
as the data (see the code below). Print a summary of the coefficients from the model and use the default plot method for the resulting fitted model object, egplot(ltm_fit)
, to see the item response/characteristic curves for all six items. Which was the “easiest” knowledge item? Which was the most difficult knowledge item?
- Use the command below to recover the “factor scores” (latent variable estimates) for the binary response IRT model that we just fit. Once you have run this command,
ltm_scores$score.dat$z1
will give you the score for each respondent in the original data set. Plot these scores against the number of correct responses per respondent. Explain the pattern that you see and what it tells us about the validity of the two approaches (tallying up correct/incorrect responses vs fitting an IRT model) for measuring political knowledge from these response data. Hint: To get the number of correct responses per respondents you can sum up each row withrowSums()
. When plotting, you can usejitter()
to create some random noise around the number of correct responses (otherwise everything will overlap and not be very readable).
- Use the R function
grm()
to fit a three-category ordered IRT model usingknowledge_battery
as the data. Print a summary of the coefficients from the model and use the default plot method for the resulting fitted model object, eg.plot(grm_fit)
, to see the item response/characteristic curves for all six items.
- Examine the signs of the discrimination parameters. What do these tell us about the patterns of responses? What do these tell us about what the latent variable is measuring for each respondent?
- Repeat the analysis you did in Q5, but this time for the scores from the ordered IRT model and comparing to
rowSums(ternary_correct)
instead ofrowSums(binary_correct)
. Use the following command to construct the factor scores:
- Plot the scores from the binary model against those of the ternary model. Discuss what you observe.
- Do you think that it makes sense to treat these response data as ordinal? Does “don’t know” necessarily reflect an intermediate level of knowledge between that of getting the answer wrong and that of getting the answer right?