6.2 Solutions

6.2.1 Exercise 1

Create a new file called assignment6.R in your PUBL0055 folder and write all the solutions in it.

In RStudio, go to the menu and select File > New File > R Script

Make sure to clear the environment and set the working directory.

rm(list = ls())
setwd("~/PUBL0055")

Go to the menu and select File > Save and name it assignment5.R

Next, we load all the packages we need for these exercises.

library(texreg)

6.2.2 Exercise 2

  1. Using the quadratic.model that we estimated above, where we included the square of GDP/capita, what is the effect of:
    1. an increase of GDP/capita from 5000 to 15000?
    2. an increase of GDP/capita from 25000 to 35000?

Load world data

world_data <- read.csv("QoG2012.csv")

Rename variables

names(world_data)[names(world_data)=="undp_hdi"] <- "human_development"
names(world_data)[names(world_data)=="wbgi_cce"] <- "institutions_quality"
names(world_data)[names(world_data)=="wdi_gdpc"] <- "gdp_capita"

Drop missing data

world_data <- world_data[ !is.na(world_data$gdp_capita), ]
world_data <- world_data[ !is.na(world_data$human_development), ]
world_data <- world_data[ !is.na(world_data$institutions_quality), ]

Run the quadratic model

quadratic.model <- lm(
  human_development ~ poly(gdp_capita, 2), 
  data = world_data
)
  1. an increase of GDP/capita from 5000 to 15000?

For this question, we make two predictions. One, where gdp/capita is 5000 and one where it is 15000.

y_hat1 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 5000))
y_hat1
        1 
0.6443723 
y_hat2 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 15000))
y_hat2
        1 
0.8272318 

The effect of raising gdp/capita from 5000 to 15000 is the difference between our two predictions (called the first difference).

y_hat2 - y_hat1
        1 
0.1828595 

The quality of life improves by 0.18 according to our model when we raise gdp/capita from 5000 to 15000. Given that the human development index ranges from 0 - 1 (theoretical range), the effect is extremely large.

  1. an increase of GDP/capita from 25000 to 35000?

We go through the same procedure.

y_hat1 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 25000))
y_hat2 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 35000))
y_hat2 - y_hat1
         1 
0.04116257 

The quality of life improves by only 0.04 when we increase gdp/capita by 10000. Although, the increase in wealth was 10,000 in both scenarios, effect is greater if the society is not already rich.

6.2.3 Exercise 3

You can see that the curve in our quadratic plot curves down when countries become very rich. Speculate whether that results make sense and what the reason for this might be.

The downward curve does not make sense because it does not reflect a relationship that we actually observe in our data. The decline in life quality is due to the functional form of the square of gdp. It has to slope down at some point. We would not want to draw the conclusion that increasing wealth at some point leads to decline in the quality of life.

6.2.4 Exercise 4

  1. Raise GDP/captia to the highest power using the poly() that improves model fit according to adjusted R-squared.
    1. Does your new model solve the potentially artificial down-curve for rich countries?
    2. Does the new model improve upon the old model?
    3. Plot the new model.

To answer this question, we raise gdp/capita by one and compare model fit until adding another power does not improve model fit.

Power of 2

model_2 <- lm(human_development ~ poly(gdp_capita, 2), data = world_data)
summary(model_2)$adj.r.squared
[1] 0.6641974

Power of 3

model_3 <- lm(human_development ~ poly(gdp_capita, 3), data = world_data)
summary(model_3)$adj.r.squared
[1] 0.7382094

Power of 4

model_4 <- lm(human_development ~ poly(gdp_capita, 4), data = world_data)
summary(model_4)$adj.r.squared
[1] 0.7688193

Power of 5

model_5 <- lm(human_development ~ poly(gdp_capita, 5), data = world_data)
summary(model_5)$adj.r.squared
[1] 0.8126378

Power of 6

model_6 <- lm(human_development ~ poly(gdp_capita, 6), data = world_data)
summary(model_6)$adj.r.squared
[1] 0.8275023

Power of 7

model_7 <- lm(human_development ~ poly(gdp_capita, 7), data = world_data)
summary(model_7)$adj.r.squared
[1] 0.83808

Power of 8

model_8 <- lm(human_development ~ poly(gdp_capita, 8), data = world_data)
summary(model_8)$adj.r.squared
[1] 0.8378012

The result is that raising gdp/captia to the power of seven provides the best model fit (i.e. it has the highest adjusted \(R^2\)). We had to manually add powers of gdp to find the answer. There is a programmatic way to solve this problem quicker by writing a loop. We don’t cover it in the seminar but you can easily find examples of how to do that with a quick search online if you’re interested.

  1. Does your new model solve the potentially artefical down-curve for rich countries?
  2. Does the new model improve upon the old model?
  3. Plot the new model.

We plot the polynomial to answer a) . To do so, we vary gdp/capita from its minimum to the maximum. This is the value of gdp values that we plot on the x axis. We use the predict() function to predict outcomes \(\hat{Y}\).

We create a sequence of 100 GDP/capita values

gdp_seq <- seq(from = 226, to = 63686, length.out = 100)

We set our covariate values with GDP as the only covariate. We then predict the outcome (human development index) for each of the GDP levels.

y_hat <- predict(model_7, newdata = data.frame(gdp_capita = gdp_seq))

Now we can plot the results

plot(
  human_development ~ gdp_capita, 
  data = world_data,
  main = "Relationship between the quality of life and wealth",
  xlab = "GDP per capita",
  ylab = "Human development index",
  pch = 20,
  frame.plot = FALSE,
  col = "LightSkyBlue"
)

# plot polynomial
lines(x = gdp_seq, y = y_hat, col = "red")

According to the adjusted \(R^2\), the model fit improves when we fit a 7th degree polynomial to the data. Although a 7th degree polynomial is very flexible, and can certainly fit these specific data points very well, it is very important to remember that we have a sample of data. This sample is subject to sampling variability, which means our sample contains some idiosyncratic aspects that do not reflect the systematic pattern between GDP/captia and the human development index. We call the systematic pattern the “signal” and the random ideosyncratic bit “noise”.

In this application, our 7th degree polynomial is probably too flexible. Although it fits the data in our sample well, we almost certainly fit our model not just to the signal but also to the noise. We want to be parsimonious with our use of polynomials. Without advanced statistics, the general advise is to stay clear of higher degree polynomials. In published articles you often see a quadratic term. Sometimes you may see a cubic term, but anything above that is not very common.

6.2.5 Exercise 5

Estimate a model where wbgi_pse (political stability) is the response variable and h_j and former_col are the explanatory variables. Include an interaction between your explanatory variables. What is the marginal effect of:

  1. An independent judiciary when the country is a former colony?
  2. An independent judiciary when the country was not colonized?
model1 <- lm(wbgi_pse ~ h_j * former_col, data = world_data)
screenreg(model1)

==========================
                Model 1   
--------------------------
(Intercept)      -0.66 ** 
                 (0.21)   
h_j               1.37 ***
                 (0.24)   
former_col        0.17    
                 (0.23)   
h_j:former_col   -0.80 *  
                 (0.32)   
--------------------------
R^2               0.30    
Adj. R^2          0.29    
Num. obs.       158       
RMSE              0.83    
==========================
*** p < 0.001, ** p < 0.01, * p < 0.05

In this setting, an interaction does not make sense. We run a model on political stability (dependent variable). Our only two independent variables are the judiciary (h_j) and colonial past (former_col). With these two binary variables only, we have 4 possible combinations:

Judiciary Ex Colony Estimate
0 0 \(= \beta_0\)
\(= -0.66\)
1 0 \(= \beta_0 + \beta_1\)
\(= -0.66 + 1.37\)
\(= 0.70\)
0 1 \(= \beta_0 + \beta_2\)
\(= -0.66 + 0.17\)
\(= -0.49\)
1 1 \(= \beta_0 + \beta_1 + \beta_2 + \beta_3\)
\(= -0.66 + 1.37 + 0.17 + -0.80\)
\(= 0.08\)

So the effect of an independent judiciary when the country is a former colony is:

\[ \begin{split} & (\beta_0 + \beta_1 + \beta_2 + \beta_3) - (\beta_0 + \beta_2) \\ = & (-0.66 + 1.37 + 0.17 + -0.8) - ( -0.66 + 0.17) \\ = & 1.37 + -0.8 \\ = & \beta_1 + \beta_3 \\ \end{split} \]

And the effect of an independent judiciary when the country is not a former colony is:

\[ \begin{split} & (\beta_0 + \beta_1) - (\beta_0 ) \\ = & (-0.66 + 1.37) - (-0.66) \\ = & 1.37\\ = & \beta_1 \end{split} \]

6.2.6 Exercise 5

Run a model on the human development index (hdi), interacting an independent judiciary (h_j) and control of corruption (corruption_control). What is the effect of control of corruption:

  1. In countries without an independent judiciary?
  2. When there is an independent judiciary?
  3. Illustrate your results.
  4. Does the interaction improve model fit?
model1 <- lm(human_development ~ institutions_quality * h_j, data = world_data)
screenreg(model1)

====================================
                          Model 1   
------------------------------------
(Intercept)                 0.67 ***
                           (0.02)   
institutions_quality        0.10 ***
                           (0.02)   
h_j                         0.05 *  
                           (0.03)   
institutions_quality:h_j    0.01    
                           (0.03)   
------------------------------------
R^2                         0.48    
Adj. R^2                    0.47    
Num. obs.                 158       
RMSE                        0.13    
====================================
*** p < 0.001, ** p < 0.01, * p < 0.05
  1. What is the effect of quality of institutions in countries without an independent judiciary?

The effect of institutions quality is \(\beta_1 = 0.10\).

  1. What is the effect of quality of institutions when there is an independent judiciary?

The effect of institutions quality is \(\beta_1 + \beta_3 = 0.10 + 0.01 = 0.11\).

  1. Illustrate your results.

Check the range of institutions_quality

summary(world_data$institutions_quality)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-1.69953 -0.81039 -0.28942 -0.01987  0.54041  2.44565 

Create a sequence to vary the quality of institutions

inst_qual <- seq(-1.7, 2.4, length.out = 100)

Set covariates when free judiciary is 0, and get predictions

controlled_judiciary <- data.frame(institutions_quality = inst_qual, h_j = 0)

y_hat1 <- predict(model1, newdata = controlled_judiciary)

Set covariates when free judiciary is 1, and get predictions

free_judiciary <- data.frame(institutions_quality = inst_qual, h_j = 1)

y_hat2 <- predict(model1, newdata = free_judiciary)

Now we plot the models

world_data$h_j <- factor(
  world_data$h_j, 
  c(0, 1), 
  labels = c("Controlled judiciary", "Independent judiciary")
)

plot(
  human_development ~ institutions_quality, 
  data = world_data,
  main = "Relationship between the quality of life and quality of institutions",
  ylab = "Human development index",
  xlab = "Quality of Institutions",
  pch = 20,
  frame.plot = FALSE,
  col = h_j
)

# add a legend
legend(
  "bottomright",  
  legend = levels(world_data$h_j), 
  col = c("black", "red"),
  lty = 1,
  pch = 20, 
  bty = "n" 
)

# add lines for free and controlled judiciary
lines(x = inst_qual, y = y_hat1, col = "black")
lines(x = inst_qual, y = y_hat2, col = "red")

The effect of the quality of institutions does not seem to be conditional on whether a country has a controlled or an independent judiciary. The interaction term is insignificant and we can see that the slope of the lines is quite similar. We would not interpret the effect of the quality of institutions as conditional. It’s substantially similar in both groups.

interaction_model <- lm(human_development ~ institutions_quality * h_j, data = world_data)
screenreg(interaction_model)

=========================================================
                                               Model 1   
---------------------------------------------------------
(Intercept)                                      0.67 ***
                                                (0.02)   
institutions_quality                             0.10 ***
                                                (0.02)   
h_jIndependent judiciary                         0.05 *  
                                                (0.03)   
institutions_quality:h_jIndependent judiciary    0.01    
                                                (0.03)   
---------------------------------------------------------
R^2                                              0.48    
Adj. R^2                                         0.47    
Num. obs.                                      158       
RMSE                                             0.13    
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
  1. Does the interaction improve model fit?
no_interaction_model <- lm(human_development ~ institutions_quality + h_j, data = world_data)
screenreg(no_interaction_model)

====================================
                          Model 1   
------------------------------------
(Intercept)                 0.68 ***
                           (0.02)   
institutions_quality        0.11 ***
                           (0.01)   
h_jIndependent judiciary    0.05 *  
                           (0.03)   
------------------------------------
R^2                         0.48    
Adj. R^2                    0.47    
Num. obs.                 158       
RMSE                        0.13    
====================================
*** p < 0.001, ** p < 0.01, * p < 0.05

The adjusted \(R^2\) indicates that the interaction model does not improve model quality.