## 2.2 Solutions

### 2.2.1 Exercise 1

Create a new file called `assignment2.R`

in your `PUBL0055`

folder and write all the solutions in it.

In RStudio, go to the menu and select **File** > **New File** > **R Script**

Make sure to clear the environment and set the working directory.

```
rm(list = ls())
setwd("~/PUBL0055")
```

Go to the menu and select **File** > **Save** and name it `assignment2.R`

### 2.2.2 Exercise 2

Clear the workspace and set the working directory to your PUBL0055 folder.

```
rm(list = ls())
setwd("~/PUBL0055")
```

### 2.2.3 Exercise 3

Load the non-western foreigners dataset from your local drive into R.

`load("non_western_foreingners.RData")`

### 2.2.4 Exercise 4

What is the level of measurement for each variable in the non-western foreigners dataset?

Variable | Level of measurement |
---|---|

IMMBRIT | interval scaled (continuous) |

over.estimate | categorical with 2 categories, also called a binary variable, a dummy variable, or an indicator variable |

Rsex | categorical as well with 2 categories |

RAge | interval scaled |

Househld | interval scaled |

paper | categorical with 2 categories |

WWWhourspW | interval scaled |

religious | categorical with 2 categories |

employMonths | interval scaled |

urban | an ordinal variable |

health.good | an ordinal variable |

HHInc | We do not have enough information to determine whether `HHInc` is interval scaled or ordinal. If the income bands are equally large, `HHInc` would be interval scaled. We will treat the variable as interval scaled. |

### 2.2.5 Exercise 5

Calculate the correct measure of central tendency for `RAge`

, `Househld`

, `religious`

.

The correct measures of central tendency for the three levels of measurement are:

Level of measurement | Central tendency |
---|---|

categorical | Mode |

ordinal | Median |

interval | Mean |

`mean(fdata$RAge)`

`[1] 49.74547`

`mean(fdata$Househld)`

`[1] 2.391802`

`mean(fdata$religious)`

`[1] 0.4928503`

The mean of age is 49.75, the mean of Househeld is 2.39. The mode of religious is 0. Note: Because religious is binary, taking the mean tells us what the mode is because we know the proportion of 1’s. 49.29% are religious, therefore, more people are not religious.

### 2.2.6 Exercise 6

Calculate the correct measure of dispersion for `RAge`

, `Househld`

, `religious`

.

`sd(fdata$RAge)`

`[1] 17.57245`

`sd(fdata$Househld)`

`[1] 1.339352`

`mean(fdata$religious)`

`[1] 0.4928503`

The standard deviation of age is 17.57, the standard deviation of the number of people in the respondents household is 1.34. 49.29% of the respondents are religious and 50.71% are not.

### 2.2.7 Exercise 7

How many respondents identify with the Greens?

```
fdata$party_self <- factor(
fdata$party_self,
labels = c("Tories", "Labour", "SNP", "Greens", "Ukip", "BNP", "other")
)
```

One solutions is to look at the frequency table

`table(fdata$party_self)`

```
Tories Labour SNP Greens Ukip BNP other
284 280 16 23 31 32 383
```

Another solution using the `which()`

function

`length(which(fdata$party_self=="Greens"))`

`[1] 23`

23 respondents identify with the Green party.

### 2.2.8 Exercise 8

Calculate the variance and standard deviation of `IMMBRIT`

for each party affiliation.

**Tories**

`var(fdata$IMMBRIT[fdata$party_self=="Tories"])`

`[1] 431.8308`

`sd(fdata$IMMBRIT[fdata$party_self=="Tories"])`

`[1] 20.78054`

**Labour**

`var(fdata$IMMBRIT[fdata$party_self=="Labour"])`

`[1] 444.8932`

`sd(fdata$IMMBRIT[fdata$party_self=="Labour"])`

`[1] 21.09249`

**SNP**

`var(fdata$IMMBRIT[fdata$party_self=="SNP"])`

`[1] 145`

`sd(fdata$IMMBRIT[fdata$party_self=="SNP"])`

`[1] 12.04159`

**Greens**

`var(fdata$IMMBRIT[fdata$party_self=="Greens"])`

`[1] 591.8103`

`sd(fdata$IMMBRIT[fdata$party_self=="Greens"])`

`[1] 24.32715`

**UKIP**

`var(fdata$IMMBRIT[fdata$party_self=="Ukip"])`

`[1] 288.2796`

`sd(fdata$IMMBRIT[fdata$party_self=="Ukip"])`

`[1] 16.9788`

**BNP**

`var(fdata$IMMBRIT[fdata$party_self=="BNP"])`

`[1] 657.1895`

`sd(fdata$IMMBRIT[fdata$party_self=="BNP"])`

`[1] 25.63571`

**Other**

`var(fdata$IMMBRIT[fdata$party_self=="other"])`

`[1] 434.8236`

`sd(fdata$IMMBRIT[fdata$party_self=="other"])`

`[1] 20.85242`

### 2.2.9 Exercise 9

Find the party affiliation of the oldest and youngest respondents.

First, we find the age of oldest and youngest respondents

`min(fdata$RAge)`

`[1] 17`

`max(fdata$RAge)`

`[1] 99`

You can also use the `range`

function which gives you both the `min`

and `max`

`range(fdata$RAge)`

`[1] 17 99`

Then we get the row index of the oldest and youngest respondents

```
oldest <- which(fdata$RAge == max(fdata$RAge))
youngest <- which(fdata$RAge == min(fdata$RAge))
```

Finally we can get the party affiliation of those respondents

`fdata$party_self[oldest]`

```
[1] other Labour
Levels: Tories Labour SNP Greens Ukip BNP other
```

`fdata$party_self[youngest]`

```
[1] other
Levels: Tories Labour SNP Greens Ukip BNP other
```

Two respondents were 99 years old. One identifies with a party other than the six parties we listed, and the other respondent indentifies with Labour.

The youngest respondent is 17 and identifies with a party other than the six we listed.

### 2.2.10 Exercise 10

Find the 20th, 40th, 60th and 80th percentiles of `RAge`

.

`quantile(fdata$RAge, c(.2, .4, .6, .8))`

```
20% 40% 60% 80%
33 44 55 66
```

### 2.2.11 Exercise 11

Create a box plot for `IMMBRIT`

grouped by the `paper`

variable to show the difference between `IMMBRIT`

for people who read daily morning newspapers three or more times per week and people who do not.

`boxplot( IMMBRIT ~ paper, data = fdata)`

The two conditional distributions look identical. This plot shows no difference in the subjective number of immigrants for people who read daily morning newspapers and people who do not.

### 2.2.12 Exercise 12

What is the mean of `IMMBRIT`

for men and for women?

**Men**

`mean(fdata$IMMBRIT[fdata$RSex==1])`

`[1] 24.53766`

**Women**

`mean(fdata$IMMBRIT[fdata$RSex==2])`

`[1] 32.79159`

The mean for men is 24.54 and the mean for women is 32.79

### 2.2.13 Exercise 13

What is the numerical difference between those two means?

`mean(fdata$IMMBRIT[fdata$RSex==2]) - mean(fdata$IMMBRIT[fdata$RSex==1])`

`[1] 8.253937`

The difference in means between women and men is 8.25 or put differently: women overestimate the number of immigrants more than men. The difference seems to be quite large at 8.25 per 100 (8.25 percentage points).