2.2 Solutions

2.2.1 Exercise 2

Clear the workspace and set the working directory to your PUBLG100 folder.

# clear workspace
rm(list = ls())
# set working directory
setwd("~/PUBLG100")

2.2.2 Exercise 3

Load the non-western foreigners dataset from your local drive into R.

load("non_western_foreingners.RData")

2.2.3 Exercise 4

What is the level of measurement for each variable in the non-western foreigners dataset?

  1. IMMBRIT is interval scaled (continuous).
  2. over.estimate is categorical with 2 categories, also called a binary variable, a dummy variable, or an indicator variable.
  3. Rsex is categorical as well with 2 categories.
  4. RAge is interval scaled.
  5. Househld is interval scaled.
  6. paper is categorical with 2 categories.
  7. WWWhourspW is interval scaled.
  8. religious is categorical with 2 categories.
  9. employMonths is interval scaled.
  10. urban is an ordinal variable.
  11. health.good is an ordinally scaled variable.
  12. HHInc We do not have enough information to determine whether HHInc is interval scaled or ordinal. If the income bands are equally large, HHInc would be interval scaled. We will treat the variable as interval scaled.

2.2.4 Exercise 5

Calculate the correct measure of central tendency for RAge, Househld, religious.

The correct measures of central tendency for the three levels of measurement are:

level_of_measurement central_tendency
categorical Mode
ordinal Median
interval Mean
mean(fdata$RAge)
[1] 49.74547
mean(fdata$Househld)
[1] 2.391802
mean(fdata$religious)
[1] 0.4928503

The mean of age is 49.7, the mean of Househeld is 2.4. The mode of religious is 0.

Note: Because religious is binary, taking the mean tells us what the mode is because we know the proportion of 1’s. 49% are religious, therefore, more people are not religious.

2.2.5 Exercise 6

Calculate the correct measure of dispersion for RAge, Househld, religious.

sd(fdata$RAge)
[1] 17.57245
sd(fdata$Househld)
[1] 1.339352
mean(fdata$religious)
[1] 0.4928503

The standard deviation of age is 17.6, the standard deviation of the number of people in the respondents household is 1.3. 49% of the respondents are religious and 51% are not.

2.2.6 Exercise 7

How many respondents identify with the Greens?

fdata$party_self <- factor(fdata$party_self, labels = c("Tories", "Labour", "SNP", "Greens", "Ukip", "BNP", "other"))

# first solution, just look at the frequency table
table(fdata$party_self)

Tories Labour    SNP Greens   Ukip    BNP  other 
   284    280     16     23     31     32    383 
# another solution using the which() function
length(which(fdata$party_self=="Greens"))
[1] 23

23 respondents identify with the Green party.

2.2.7 Exercise 8

Calculate the variance and standard deviation of IMMBRIT for each party affiliation.

# conservatives
var(fdata$IMMBRIT[fdata$party_self=="Tories"])
[1] 431.8308
sd(fdata$IMMBRIT[fdata$party_self=="Tories"])
[1] 20.78054
# labour
var(fdata$IMMBRIT[fdata$party_self=="Labour"])
[1] 444.8932
sd(fdata$IMMBRIT[fdata$party_self=="Labour"])
[1] 21.09249
# snp
var(fdata$IMMBRIT[fdata$party_self=="SNP"])
[1] 145
sd(fdata$IMMBRIT[fdata$party_self=="SNP"])
[1] 12.04159
# greens
var(fdata$IMMBRIT[fdata$party_self=="Greens"])
[1] 591.8103
sd(fdata$IMMBRIT[fdata$party_self=="Greens"])
[1] 24.32715
# ukip
var(fdata$IMMBRIT[fdata$party_self=="Ukip"])
[1] 288.2796
sd(fdata$IMMBRIT[fdata$party_self=="Ukip"])
[1] 16.9788
# bnp
var(fdata$IMMBRIT[fdata$party_self=="BNP"])
[1] 657.1895
sd(fdata$IMMBRIT[fdata$party_self=="BNP"])
[1] 25.63571
# other
var(fdata$IMMBRIT[fdata$party_self=="other"])
[1] 434.8236
sd(fdata$IMMBRIT[fdata$party_self=="other"])
[1] 20.85242

2.2.8 Exercise 9

Find the party affiliation of the oldest and youngest respondents.

# max and min ages
range(fdata$RAge)
[1] 17 99
# row index numbers of oldest and youngest respondents
oldest <- which(fdata$RAge == max(fdata$RAge))
youngest <- which(fdata$RAge == min(fdata$RAge))

# party affiliation of those respondents
fdata$party_self[oldest]
[1] other  Labour
Levels: Tories Labour SNP Greens Ukip BNP other
fdata$party_self[youngest]
[1] other
Levels: Tories Labour SNP Greens Ukip BNP other

Two respondents were 99 years old. One identifies with a party other than the six parties we listed, and the other respondent indentifies with Labour.

The youngest respondent is 17 and identifies with a party other than the six we listed.

2.2.9 Exercise 10

Find the 20th, 40th, 60th and 80th percentiles of RAge.

quantile(fdata$RAge, c(.2, .4, .6, .8))
20% 40% 60% 80% 
 33  44  55  66 

2.2.10 Exercise 11

Create a box plot for IMMBRIT grouped by the paper variable to show the difference between IMMBRIT for people who read daily morning newspapers three or more times per week and people who do not.

boxplot( IMMBRIT ~ paper, data = fdata)

The two conditional distributions look identical. This plot shows no difference in the subjective number of immigrants for people who read daily morning newspapers and people who do not.

2.2.11 Exercise 12

What is the mean of IMMBRIT for men and for women?

# men
mean(fdata$IMMBRIT[fdata$RSex==1])
[1] 24.53766
# women
mean(fdata$IMMBRIT[fdata$RSex==2])
[1] 32.79159

The mean for men is 24.5 and the mean for women is 32.9

2.2.12 Exercise 13

What is the numerical difference between those two means?

mean(fdata$IMMBRIT[fdata$RSex==2]) - mean(fdata$IMMBRIT[fdata$RSex==1])
[1] 8.253937

The difference in means between women and men is 8.3 or put differently: women overestimate the number of immigrants more than men. The difference seems to be quite large 8.3 per 100 (8.3 percentage points).