## 6.2 Solutions

### 6.2.1 Exercise 1

Create a new file called `assignment6.R`

in your `PUBL0055`

folder and write all the solutions in it.

In RStudio, go to the menu and select **File** > **New File** > **R Script**

Make sure to clear the environment and set the working directory.

```
rm(list = ls())
setwd("~/PUBL0055")
```

Go to the menu and select **File** > **Save** and name it `assignment5.R`

Next, we load all the packages we need for these exercises.

`library(texreg)`

### 6.2.2 Exercise 2

- Using the
`quadratic.model`

that we estimated above, where we included the square of GDP/capita, what is the effect of:- an increase of GDP/capita from 5000 to 15000?
- an increase of GDP/capita from 25000 to 35000?

Load world data

`world_data <- read.csv("QoG2012.csv")`

Rename variables

```
names(world_data)[names(world_data)=="undp_hdi"] <- "human_development"
names(world_data)[names(world_data)=="wbgi_cce"] <- "institutions_quality"
names(world_data)[names(world_data)=="wdi_gdpc"] <- "gdp_capita"
```

Drop missing data

```
world_data <- world_data[ !is.na(world_data$gdp_capita), ]
world_data <- world_data[ !is.na(world_data$human_development), ]
world_data <- world_data[ !is.na(world_data$institutions_quality), ]
```

Run the quadratic model

```
quadratic.model <- lm(
human_development ~ poly(gdp_capita, 2),
data = world_data
)
```

- an increase of GDP/capita from 5000 to 15000?

For this question, we make two predictions. One, where gdp/capita is 5000 and one where it is 15000.

```
y_hat1 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 5000))
y_hat1
```

```
1
0.6443723
```

```
y_hat2 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 15000))
y_hat2
```

```
1
0.8272318
```

The effect of raising gdp/capita from 5000 to 15000 is the difference between our two predictions (called the first difference).

`y_hat2 - y_hat1`

```
1
0.1828595
```

The quality of life improves by 0.18 according to our model when we raise gdp/capita from 5000 to 15000. Given that the human development index ranges from 0 - 1 (theoretical range), the effect is extremely large.

- an increase of GDP/capita from 25000 to 35000?

We go through the same procedure.

```
y_hat1 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 25000))
y_hat2 <- predict(quadratic.model, newdata = data.frame(gdp_capita = 35000))
y_hat2 - y_hat1
```

```
1
0.04116257
```

The quality of life improves by only 0.04 when we increase gdp/capita by 10000. Although, the increase in wealth was 10,000 in both scenarios, effect is greater if the society is not already rich.

### 6.2.3 Exercise 3

You can see that the curve in our quadratic plot curves down when countries become very rich. Speculate whether that results make sense and what the reason for this might be.

The downward curve does not make sense because it does not reflect a relationship that we actually observe in our data. The decline in life quality is due to the functional form of the square of gdp. It has to slope down at some point. We would not want to draw the conclusion that increasing wealth at some point leads to decline in the quality of life.

### 6.2.4 Exercise 4

- Raise GDP/captia to the highest power using the
`poly()`

that improves model fit according to adjusted R-squared.- Does your new model solve the potentially artificial down-curve for rich countries?
- Does the new model improve upon the old model?
- Plot the new model.

To answer this question, we raise gdp/capita by one and compare model fit until adding another power does not improve model fit.

**Power of 2**

```
model_2 <- lm(human_development ~ poly(gdp_capita, 2), data = world_data)
summary(model_2)$adj.r.squared
```

`[1] 0.6641974`

**Power of 3**

```
model_3 <- lm(human_development ~ poly(gdp_capita, 3), data = world_data)
summary(model_3)$adj.r.squared
```

`[1] 0.7382094`

**Power of 4**

```
model_4 <- lm(human_development ~ poly(gdp_capita, 4), data = world_data)
summary(model_4)$adj.r.squared
```

`[1] 0.7688193`

**Power of 5**

```
model_5 <- lm(human_development ~ poly(gdp_capita, 5), data = world_data)
summary(model_5)$adj.r.squared
```

`[1] 0.8126378`

**Power of 6**

```
model_6 <- lm(human_development ~ poly(gdp_capita, 6), data = world_data)
summary(model_6)$adj.r.squared
```

`[1] 0.8275023`

**Power of 7**

```
model_7 <- lm(human_development ~ poly(gdp_capita, 7), data = world_data)
summary(model_7)$adj.r.squared
```

`[1] 0.83808`

**Power of 8**

```
model_8 <- lm(human_development ~ poly(gdp_capita, 8), data = world_data)
summary(model_8)$adj.r.squared
```

`[1] 0.8378012`

The result is that raising gdp/captia to the power of seven provides the best model fit (i.e. it has the highest adjusted \(R^2\)). We had to manually add powers of gdp to find the answer. There is a programmatic way to solve this problem quicker by writing a loop. We don’t cover it in the seminar but you can easily find examples of how to do that with a quick search online if you’re interested.

- Does your new model solve the potentially artefical down-curve for rich countries?
- Does the new model improve upon the old model?
- Plot the new model.

We plot the polynomial to answer a) . To do so, we vary gdp/capita from its minimum to the maximum. This is the value of gdp values that we plot on the x axis. We use the `predict()`

function to predict outcomes \(\hat{Y}\).

We create a sequence of 100 GDP/capita values

`gdp_seq <- seq(from = 226, to = 63686, length.out = 100)`

We set our covariate values with GDP as the only covariate. We then predict the outcome (human development index) for each of the GDP levels.

`y_hat <- predict(model_7, newdata = data.frame(gdp_capita = gdp_seq))`

Now we can plot the results

```
plot(
human_development ~ gdp_capita,
data = world_data,
main = "Relationship between the quality of life and wealth",
xlab = "GDP per capita",
ylab = "Human development index",
pch = 20,
frame.plot = FALSE,
col = "LightSkyBlue"
)
# plot polynomial
lines(x = gdp_seq, y = y_hat, col = "red")
```

According to the adjusted \(R^2\), the model fit improves when we fit a 7th degree polynomial to the data. Although a 7th degree polynomial is very flexible, and can certainly fit these specific data points very well, it is very important to remember that we have a sample of data. This sample is subject to sampling variability, which means our sample contains some idiosyncratic aspects that do not reflect the systematic pattern between GDP/captia and the human development index. We call the systematic pattern the “signal” and the random ideosyncratic bit “noise”.

In this application, our 7th degree polynomial is probably too flexible. Although it fits the data **in our sample** well, we almost certainly fit our model not just to the signal but also to the noise. We want to be parsimonious with our use of polynomials. Without advanced statistics, the general advise is to stay clear of higher degree polynomials. In published articles you often see a quadratic term. Sometimes you may see a cubic term, but anything above that is not very common.

### 6.2.5 Exercise 5

Estimate a model where `wbgi_pse`

(political stability) is the response variable and `h_j`

and `former_col`

are the explanatory variables. Include an interaction between your explanatory variables. What is the marginal effect of:

- An independent judiciary when the country is a former colony?
- An independent judiciary when the country was not colonized?

```
model1 <- lm(wbgi_pse ~ h_j * former_col, data = world_data)
screenreg(model1)
```

```
==========================
Model 1
--------------------------
(Intercept) -0.66 **
(0.21)
h_j 1.37 ***
(0.24)
former_col 0.17
(0.23)
h_j:former_col -0.80 *
(0.32)
--------------------------
R^2 0.30
Adj. R^2 0.29
Num. obs. 158
RMSE 0.83
==========================
*** p < 0.001, ** p < 0.01, * p < 0.05
```

In this setting, an interaction does not make sense. We run a model on political stability (dependent variable). Our only two independent variables are the judiciary (`h_j`

) and colonial past (`former_col`

). With these two binary variables only, we have 4 possible combinations:

Judiciary | Ex Colony | Estimate |
---|---|---|

0 | 0 | \(= \beta_0\) \(= -0.66\) |

1 | 0 | \(= \beta_0 + \beta_1\) \(= -0.66 + 1.37\) \(= 0.70\) |

0 | 1 | \(= \beta_0 + \beta_2\) \(= -0.66 + 0.17\) \(= -0.49\) |

1 | 1 | \(= \beta_0 + \beta_1 + \beta_2 + \beta_3\) \(= -0.66 + 1.37 + 0.17 + -0.80\) \(= 0.08\) |

So the effect of an independent judiciary when the country is a former colony is:

\[ \begin{split} & (\beta_0 + \beta_1 + \beta_2 + \beta_3) - (\beta_0 + \beta_2) \\ = & (-0.66 + 1.37 + 0.17 + -0.8) - ( -0.66 + 0.17) \\ = & 1.37 + -0.8 \\ = & \beta_1 + \beta_3 \\ \end{split} \]

And the effect of an independent judiciary when the country is not a former colony is:

\[ \begin{split} & (\beta_0 + \beta_1) - (\beta_0 ) \\ = & (-0.66 + 1.37) - (-0.66) \\ = & 1.37\\ = & \beta_1 \end{split} \]

### 6.2.6 Exercise 5

Run a model on the human development index (hdi), interacting an independent judiciary (h_j) and control of corruption (corruption_control). What is the effect of control of corruption:

- In countries without an independent judiciary?
- When there is an independent judiciary?
- Illustrate your results.
- Does the interaction improve model fit?

```
model1 <- lm(human_development ~ institutions_quality * h_j, data = world_data)
screenreg(model1)
```

```
====================================
Model 1
------------------------------------
(Intercept) 0.67 ***
(0.02)
institutions_quality 0.10 ***
(0.02)
h_j 0.05 *
(0.03)
institutions_quality:h_j 0.01
(0.03)
------------------------------------
R^2 0.48
Adj. R^2 0.47
Num. obs. 158
RMSE 0.13
====================================
*** p < 0.001, ** p < 0.01, * p < 0.05
```

- What is the effect of quality of institutions in countries without an independent judiciary?

The effect of institutions quality is \(\beta_1 = 0.10\).

- What is the effect of quality of institutions when there is an independent judiciary?

The effect of institutions quality is \(\beta_1 + \beta_3 = 0.10 + 0.01 = 0.11\).

- Illustrate your results.

Check the range of `institutions_quality`

`summary(world_data$institutions_quality)`

```
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.69953 -0.81039 -0.28942 -0.01987 0.54041 2.44565
```

Create a sequence to vary the quality of institutions

`inst_qual <- seq(-1.7, 2.4, length.out = 100)`

Set covariates when free judiciary is 0, and get predictions

```
controlled_judiciary <- data.frame(institutions_quality = inst_qual, h_j = 0)
y_hat1 <- predict(model1, newdata = controlled_judiciary)
```

Set covariates when free judiciary is 1, and get predictions

```
free_judiciary <- data.frame(institutions_quality = inst_qual, h_j = 1)
y_hat2 <- predict(model1, newdata = free_judiciary)
```

Now we plot the models

```
world_data$h_j <- factor(
world_data$h_j,
c(0, 1),
labels = c("Controlled judiciary", "Independent judiciary")
)
plot(
human_development ~ institutions_quality,
data = world_data,
main = "Relationship between the quality of life and quality of institutions",
ylab = "Human development index",
xlab = "Quality of Institutions",
pch = 20,
frame.plot = FALSE,
col = h_j
)
# add a legend
legend(
"bottomright",
legend = levels(world_data$h_j),
col = c("black", "red"),
lty = 1,
pch = 20,
bty = "n"
)
# add lines for free and controlled judiciary
lines(x = inst_qual, y = y_hat1, col = "black")
lines(x = inst_qual, y = y_hat2, col = "red")
```

The effect of the quality of institutions does not seem to be conditional on whether a country has a controlled or an independent judiciary. The interaction term is insignificant and we can see that the slope of the lines is quite similar. We would not interpret the effect of the quality of institutions as conditional. It’s substantially similar in both groups.

```
interaction_model <- lm(human_development ~ institutions_quality * h_j, data = world_data)
screenreg(interaction_model)
```

```
=========================================================
Model 1
---------------------------------------------------------
(Intercept) 0.67 ***
(0.02)
institutions_quality 0.10 ***
(0.02)
h_jIndependent judiciary 0.05 *
(0.03)
institutions_quality:h_jIndependent judiciary 0.01
(0.03)
---------------------------------------------------------
R^2 0.48
Adj. R^2 0.47
Num. obs. 158
RMSE 0.13
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
```

- Does the interaction improve model fit?

```
no_interaction_model <- lm(human_development ~ institutions_quality + h_j, data = world_data)
screenreg(no_interaction_model)
```

```
====================================
Model 1
------------------------------------
(Intercept) 0.68 ***
(0.02)
institutions_quality 0.11 ***
(0.01)
h_jIndependent judiciary 0.05 *
(0.03)
------------------------------------
R^2 0.48
Adj. R^2 0.47
Num. obs. 158
RMSE 0.13
====================================
*** p < 0.001, ** p < 0.01, * p < 0.05
```

The adjusted \(R^2\) indicates that the interaction model does not improve model quality.