## 9.2 Solutions

### 9.2.1 Exercise 1

Create a new file called `assignment9.R`

in your `PUBL0055`

folder and write all the solutions in it.

In RStudio, go to the menu and select **File** > **New File** > **R Script**

Make sure to clear the environment and set the working directory.

```
rm(list = ls())
setwd("~/PUBL0055")
```

Go to the menu and select **File** > **Save** and name it `assignment9.R`

Next, we load all the packages we need for these exercises.

```
library(foreign)
library(texreg)
```

### 9.2.2 Exercise 2

Load the Swiss gun control survey dataset `SwissData2011.dta`

from your `PUBL0055`

folder or from the datasets repository

### 9.2.3 Exercise 3

Create plots of `age`

vs `VoteYes`

and `LeftRight`

vs `VoteYes`

. Add a regression line (from a linear probability model) to the plots

```
linear_model <- lm(VoteYes ~ age, data = swiss)
plot(
VoteYes ~ age,
data = swiss,
pch = 20,
col = "LightSkyBlue",
frame.plot = FALSE,
xlab = "Age",
ylab = "Vote (1 = yes, 0 = no)",
main = "Voting Yes on Assault Rifle Ban by Age"
)
abline(linear_model, col = "red")
```

```
linear_model <- lm(VoteYes ~ age, data = swiss)
plot(
VoteYes ~ LeftRight,
data = swiss,
pch = 20,
col = "LightSkyBlue",
frame.plot = FALSE,
xlab = "Left-Right Self Placement",
ylab = "Vote (1 = yes, 0 = no)",
main = "Voting Yes on Assault Rifle Ban by Ideology"
)
abline(linear_model, col = "red")
```

### 9.2.4 Exercise 4

Estimate a logistic regression model which includes `age`

and `LeftRight`

as predictors, and `VoteYes`

as the dependent variable

```
logit_model1 <- glm(
VoteYes ~ age + LeftRight,
data = swiss,
family = binomial(link="logit")
)
screenreg(logit_model1)
```

```
===========================
Model 1
---------------------------
(Intercept) 5.01 ***
(0.42)
age -0.02 ***
(0.01)
LeftRight -0.81 ***
(0.06)
---------------------------
AIC 935.71
BIC 950.13
Log Likelihood -464.86
Deviance 929.71
Num. obs. 902
===========================
*** p < 0.001, ** p < 0.01, * p < 0.05
```

### 9.2.5 Exercise 5

What is the direction of the relationship between `LeftRight`

on `VoteYes`

? Is this association significant?

The coefficient is negative, implying that people who are more conservative (higher values indicate more right-wing people) are less likely to vote in favour of the ban, holding age constant. The coefficient is very large relative to the standard error, and is significant at all conventional levels.

### 9.2.6 Exercise 6

What is the direction of the relationship between `age`

on `VoteYes`

? Is this association significant?

The coefficient is negative, implying that people who are older are less likely to vote in favour of the ban, holding ideology constant. The coefficient is again large relative to the standard error, and is significant at all conventional levels. In short, voting yes on the ban also becomes less likely with increasing age.

### 9.2.7 Exercise 7

What is the effect on *the probability of a “yes” vote* of moving from a left-right self placement of 5 to a self placement of 6 for an individual who is 44 years old?

Calculate the predicted probability of yes-vote for left-right 5 and age 44

```
leftright_5 <- predict(
logit_model1,
newdata = data.frame(age = 44, LeftRight = 5),
type = "response"
)
leftright_5
```

```
1
0.5259904
```

Calculate the predicted probability of yes-vote for left-right 6 and age 44

```
leftright_6 <- predict(
logit_model1,
newdata = data.frame(age = 44, LeftRight = 6),
type = "response"
)
leftright_6
```

```
1
0.3302643
```

Now let’s calculate the first differences

`leftright_5 - leftright_6`

```
1
0.1957262
```

The probability of voting in favour decreases by 19.57 percentage points from 52.60% to 33.03%. Assuming a treshold of 0.5, we would predict that the 44 years old respondent who is at 5 on the ideology scale would vote for the ban. The respondent who is of the same age but at 6 on the ideology scale would vote against the ban.

### 9.2.8 Exercise 8

Calculate and plot the predicted probability of voting yes across the range of the `age`

variable for individuals who have a left-right self placement of 5. Do the same across the range of the `LeftRight`

variable for individuals with an age of 50.

First, we calculate the predicted probability of voting yes across the range of the `age`

variable for individuals who have a left-right self placement of 5:

```
age_range <- seq(min(swiss$age), max(swiss$age), length.out = 100)
leftright_5 <- predict(
logit_model1,
newdata = data.frame(age = age_range, LeftRight = 5),
type = "response"
)
plot(
VoteYes ~ age,
data = subset(swiss, LeftRight == 5),
pch = 20,
col = "LightSkyBlue",
frame.plot = FALSE,
xlab = "Age",
ylab = "Vote (1 = yes, 0 = no)",
main = "Voting Yes on Assault Rifle Ban by Age"
)
# draw a line at 0.5 as reference (optional)
abline(h=.5, lty = "dashed", col = "DarkGrey")
text(x = 80, y= .5, pos = 3, "p = 0.5")
lines(x = age_range, y = leftright_5, col = "red")
```

`range(leftright_5)`

`[1] 0.3080830 0.6477591`

We predict that the probability of voting for the ban decreases with age for a respondent with centrist ideology (left-right = 5) from 30.81% to 64.78%. This is a substantial effect.

Now let’s do the same across the range of the `LeftRight`

variable for individuals with an age of 50

```
leftright_range <- seq(
min(swiss$LeftRight, na.rm = TRUE),
max(swiss$LeftRight, na.rm = TRUE),
length.out = 100
)
age_50 <- predict(
logit_model1,
newdata = data.frame(age = 50, LeftRight = leftright_range),
type = "response"
)
plot(
VoteYes ~ LeftRight,
data = subset(swiss, age == 50),
pch = 20,
col = "LightSkyBlue",
frame.plot = FALSE,
xlab = "Left-Right Self Placement",
ylab = "Vote (1 = yes, 0 = no)",
main = "Voting Yes on Assault Rifle Ban by Ideology"
)
# draw a line at 0.5 as reference (optional)
abline(h=.5, lty = "dashed", col = "DarkGrey")
text(x = 8, y= .5, pos = 3, "p = 0.5")
lines(x = leftright_range, y = age_50, col = "red")
```

We predict that the probability of voting for the ban decreases the more conservative the respondent. For a 50 year old respondent, who is on the extreme left of the ideology scale, we predict a 98.28% probability of voting yes. The predicted probability of voting in favour decreases rapidly the more conservative the respondent, to 3.71% at the extreme right of the scale.

### 9.2.9 Exercise 9

Looking at the other variables in the `swiss`

data, which do you think might be important determinants of a “yes” vote in the referendum? Write a short paragraph justifying the importance of 3 of the predictors in *theoretical* terms. Include the additional explanatory variables you have selected in a new model that also includes `age`

and `LeftRight`

.

For this exercise, we include `university`

, `german`

, and `urban`

in our model (though you could, of course, have chosen other variables).

We conjecture that repsondents who are university educated are more liberal in general and more in favour of gun control specifically. University classes, especially those on statistics, teach that gun laws are not beneficial for society as a whole in the sense that freely available guns do not prevent crime but are potentially harmful. Furthermore, we control for the cultural language divide in Switzerland, where the German speaking areas are potentially more conservative and more in favour of liberal gun laws. We control for urbanization because we suspect that rural areas are more status quo oriented, whereas urban areas favour tighter regulation of guns.

```
logit_model2 <- glm(
VoteYes ~ age + LeftRight + university + german + urban,
data = swiss,
family = binomial(link="logit")
)
screenreg(list(logit_model1, logit_model2))
```

```
========================================
Model 1 Model 2
----------------------------------------
(Intercept) 5.01 *** 4.64 ***
(0.42) (0.44)
age -0.02 *** -0.02 ***
(0.01) (0.01)
LeftRight -0.81 *** -0.80 ***
(0.06) (0.06)
university 0.67 ***
(0.19)
german -0.12
(0.17)
urban 0.42 *
(0.18)
----------------------------------------
AIC 935.71 917.79
BIC 950.13 946.61
Log Likelihood -464.86 -452.90
Deviance 929.71 905.79
Num. obs. 902 901
========================================
*** p < 0.001, ** p < 0.01, * p < 0.05
```

### 9.2.10 Exercise 10

Provide some predicted probabilities from the model that illustrate the substantive importance of the new variables that you have added.

The effect of `german`

in our new model is a statistical zero (notice that the standard error is larger in absolute terms than the coefficient). As expected, university educated respondents are more in favour of the assault rifle regulation. Similarly, the more urban the canton, the respondent lives in, the more likely the respondent is to be in favour of the regulation.

Comparing our two models, we conclude that none of the new variables were confounding the effects of age and ideology. The effects of age and ideology are substantially unchanged.

We illustrate the effect of higher education and urbanisation in a plot. We vary urbanization from its minimum to its maximum. We plot two lines one for respondents with higher education and one for respondents without university education. We will keep constant `age`

, `LeftRight`

, and `german`

and the appropriate measures of central tendency.

First, let’s calculate predicted probabilities for those with a university degree

```
urban_range <- seq(min(swiss$urban), max(swiss$urban),length.out = 100)
uni_degree <- data.frame(
university = 1,
age = mean(swiss$age),
LeftRight = mean(swiss$LeftRight, na.rm = TRUE),
german = median(swiss$german),
urban = urban_range
)
uni_degree$pps <- predict(
logit_model2,
newdata = uni_degree,
type = "response"
)
```

Now we do the same for voters without a university degree

```
no_uni_degree <- data.frame(
university = 0,
age = mean(swiss$age),
LeftRight = mean(swiss$LeftRight, na.rm = TRUE),
german = median(swiss$german),
urban = urban_range
)
no_uni_degree$pps <- predict(
logit_model2,
newdata = no_uni_degree,
type = "response"
)
```

```
plot(
pps ~ urban,
data = uni_degree,
type = "l",
ylim = c(0,1),
frame.plot = FALSE,
main = "Effects of Urbanisation and Higher Education on Yes-Vote",
xlab = "Share of citizens living in urban areas",
ylab = "Predicted Probability of Yes-Vote"
)
# draw a line at 0.5 as reference (optional)
abline(h=.5, lty = "dashed", col = "DarkGrey")
text(x = 0.9, y= .5, pos = 1, "p = 0.5")
lines(pps ~ urban, data = no_uni_degree, col = "red")
```

`range(uni_degree$pps)`

`[1] 0.6040835 0.6988812`

`range(no_uni_degree$pps)`

`[1] 0.4376645 0.5421051`

We show the effects of urbanisation and higher education for a centrist respondent with mean age (49 years old), from the German speaking part of Switzerland. An average respondent such as this without a university degree living in the most rural part of Switzerland is predicted to vote yes with 44/% probabilty. The same respondent with higher education is predicted to vote yes with probability 60%. At the other extreme end of urbanization the predicted probability of a yes-vote for a respondent without higher education is 54%. The respondent with a university degree is predicted to vote yes with 70% probability.

### 9.2.11 Exercise 11

Compare the percentage correctly predicted from the simple model that you estimated with just `LeftRight`

and `age`

as predictors to the percentage correctly predicted from the model with the additional explanatory variables that you included in the question above. Which has the greater explanatory power?

First, we calculate the predicted probabilities for each observation in our sample. We do this twice, once for the `logit_model1`

model (which only includes age and Left-Right placement as explanatory variables), and once for the `logit_model2`

model (which includes age, Left-Right, university, german and urban as explanatory variables).

First, calculate predicted probabilities for each observation

`pps <- predict(logit_model1, newdata = swiss, type = "response")`

Next, convert the predicted probabilities to expected values of `0`

or `1`

`evs <- ifelse(pps > 0.5, yes = 1, no = 0)`

Next we calculate the confusion matrix

```
confusion <- table(actual = swiss$VoteYes, expected.value = evs)
confusion
```

```
expected.value
actual 0 1
0 347 96
1 130 329
```

Finally, calculate the percentage correctly predicted for the model

`sum(diag(confusion)) / sum(confusion)`

`[1] 0.7494457`

We can do the same for the second model now

```
pps <- predict(logit_model2, newdata = swiss, type = "response")
evs <- ifelse(pps > 0.5, yes = 1, no = 0)
confusion <- table(actual = swiss$VoteYes, expected.value = evs)
sum(diag(confusion)) / sum(confusion)
```

`[1] 0.7524972`

There is a very small difference in predictive power between the two models. The first, simpler model correctly predicts the outcome for 74.94% of observations, and this increases to only 75.25% of observations in the more extensive model. This is a small increase in predictive accuracy for a model than includes 3 additional covariates.