6: Causal Inference with Text

Jack Blumenau

Midterm survey

Causal Inference Review

Knowledge and Causality

We do not have knowledge of a thing until we have grasped its why, that is to say, its cause.

Aristotle, Physics

Causality and Text-as-Data

Throughout this course we have been interested in text-as-data approaches to quantitative measurement
Typically, social scientists are not only interested in measuring different quantities, but rather are also interested in explaining how these quantities vary
Many of the most interesting and important research questions investigate causal relationships between phenomena
Causal inference involves questions about counterfactual outcomes
- We are interested in the values that a variable would have taken had circumstances been different
Quantitative text analyses that answer causal questions are a new and important area of research

Causality: The relationship between events where one set of events (the effects/outcomes) is a direct consequence of another set of events (the causes/treatments).

Causal Inference: The process by which one can use data to make claims about causal relationships.

Potential Outcomes Notation

Goal: Identify the effect of a treatment variable, $D_i$, on an outcome variable $Y_i$, sometimes by use of covariates $X_i$.

Definition: Outcome

$Y_i$ is the observed value of the outcome variable of interest for unit $i$. For example:

$Y_i = 1$: headache; $Y_i = 0$: no headache
$Y_i = 1$: voted; $Y_i = 0$: did not vote
$Y_i = 1$: aggressive language in document; $Y_i = 0$: no aggressive language in document

(Defined for binary case, but we can generalise to continuous outcomes)

Definition: Treatment

$D_i$: treatment status (causal variable) for unit $i$

\[ D_i = \left\{ \begin{array}{ll} 1 & \mbox{if unit $i$ received the treatment}\\ 0 & \mbox{otherwise}. \end{array} \right. \]

E.g.

$D_i = 1$: asprin; $D_i = 0$: no asprin
$D_i = 1$: encouragement to vote; $D_i = 0$: no encouragement to vote
$D_i = 1$: author of document is male; $D_i = 0$: author of document is female

(Defined for binary case, but we can generalise to continuous treatments)

Definition: Covariate

$X_i$: Observed covariate of interest for unit $i$

Potential Outcomes Notation

Definition: Potential Outcome

$Y_{0i}$ and $Y_{1i}$: Potential outcomes for unit $i$

\[ Y_{di} = \left\{ \begin{array}{ll} Y_{1i} & \mbox{Potential outcome for unit $i$ with treatment}\\ Y_{0i} & \mbox{Potential outcome for unit $i$ without treatment} \end{array} \right. \]

If $D_i = 1$, only $Y_{1i}$ is realised/observed. $Y_{0i}$ is what the outcome would have been if $D_i$ had been 0
If $D_i = 0$, only $Y_{0i}$ is realised/observed. $Y_{1i}$ is what the outcome would have been if $D_i$ had been 1
$\rightarrow$ potential outcomes are fixed attributes for each $i$ and represent the outcome that would be observed hypothetically if $i$ were treated/untreated

Definition: Causal Effect

For each unit $i$, the causal effect of the treatment on the outcome is defined as the difference between its two potential outcomes: \[ \tau_i \equiv Y_{1i} - Y_{0i} \]

$\tau_i$ is the difference between two hypothetical states of the world
- One where $i$ receives the treatment
- One where $i$ does not receive the treatment

Definition: Fundamental Problem of Causal Inference

We cannot observe both potential outcomes $(Y_{1i},Y_{0i})$ for the same unit $i$

Causal inference is difficult because it is about something we can never see.

–Paul Rosenbaum, Observation and Experiment

Difference in Means and Selection Bias

Although we can never observe both potential outcomes under different treatment assignments for a single unit, we do observe treated and untreated units

We can try to estimate the average treatment effect across all units

For a given sample, one obvious estimator of causal effects is the difference in group means (DIGM)

Label units such that $D_i = 1$ for $i \in \{ 1,2,...,m\}$ and $D_i = 0$ for $i \in \{ m+1,m+2,...,n\}$

\[ \text{DIGM} \equiv \frac{1}{m}\sum_{i=1}^m Y_{i} - \frac{1}{n-m}\sum_{i=m+1}^n Y_{i} \]

i.e. the average difference in observed outcomes between treatment and control

Problem: $\text{DIGM}$ captures two different quantities:

The causal effect of the treatment on the outcome (the average treatment effect)
Any baseline differences between the two groups (selection bias)

Difference in Means and Selection Bias

For example:

$Y_i = 1$ if speech $i$ contains aggressive language
$Y_i = 0$ if speech $i$ does not contain aggressive language

$D_i = 1$ if the speaker for speech $i$ is male
$D_i = 0$ if the speaker for speech $i$ is female

Imagine that:

\[ \frac{1}{\# \ of \ men}\sum_{i=1}^m Y_{i} - \frac{1}{\# \ of \ women}\sum_{i=m+1}^n Y_{i} > 0 \]

What would you conclude?

Treatment effect: Men are more aggressive than women
Selection effect:
- Men may be older than women in the sample, and older people may be more aggressive
- Men may speak more frequently than women about topics that trigger aggressive language (e.g. war, government question time, etc.)
- Etc

What Causes Selection Bias?

Confounding/selection bias occurs if and only if there is an omitted variable, $X$, that:

Correlates with the treatment $D$
Correlates with the outcome $Y$

If such a variable exists, then the simple comparison of group means will not measure the causal effect of $D$ on $Y$.

If an unobserved variable $X$ affects both $D$ and $Y$, it creates a spurious association
This biases causal estimates unless we control for $X$
If we do not control for $X$, we cannot distinguish the true causal effect of $D$ from the effect of $X$

Causal Analysis Strategies

Several commonly used strategies to overcome the selection bias problem:

Randomization (experiments)
- Rule out selection bias because they ensure that, on average, treatment and control units are identical on all characteristics (including $X$)
- Data requirements: $Y$, randomly assigned $D$

Controlling for Confounders (regression, matching)
- Rule out selection bias by making comparisons between treatment and control units that have the same observed covariate values (e.g. $X$)
- Fails to account for all covariates that are not observed
- Data requirements: $Y$, $D$, and $X$

Other strategies (RDD, IV, etc)
- Find situations where some set of units are plausibly randomly assigned to treatment status, making those units comparable to those assigned to control
- Data requirements: $Y$, $D$ - which includes some randomly determined component - and $X$

Mapping Function: $g(W_i)$

Every week on this course we have studied methods that allow us to take a corpus and create low-dimensional summaries of the texts contained within it
We can think of each summary as a function, $g()$, which we apply to the dfm word counts, $W_i$ and which results in a particular quantity, $\pi_i$, for each document
- Dictionaries: $g(W_i) = \mu_i \rightarrow$ dictionary score (continuous/categorical)
- Complexity: $g(W_i) = \mu_i \rightarrow$ complexity score (continuous)
- Naive Bayes: $g(W_i) = \mu_i \rightarrow$ classification (categorical)
- Topic model: $g(W_i) = \mu_i \rightarrow$ topic proportions (continuous)
- LLM: $g(W_i) = \mu_i \rightarrow$ classification; topic; other
Each mapping function produces a simplification of the text
There is no single “correct” mapping function, as it will depend on the research question of interest

Mapping Function: $g(W_i)$

Researchers typically do not know $g()$ before they have seen their data
We typically need to look at the data in order to work out a good mapping function
- Human-coding $\rightarrow$ developing a codebook
- Supervised learning $\rightarrow$ manually coding some documents and training
- Topic models $\rightarrow$ estimating on the corpus
- etc
Good measurement often requires iterating between exploration and validation, so looking at the data is essential to generating valid $g()$ functions

$g(W_i)$ as Outcome, Treatment, or Control

Text data might be used to construct measures for any of $Y$, $D$, or $X$.

Text as Outcome
- $g(W_i) = \alpha + \beta_1D_i + \beta_2X_i$
Text as Treatment
- $Y_i = \alpha + \beta_1g(W_i) + \beta_2X_i$
Text as Control
- $Y_i = \alpha + \beta_1D_i + \beta_2g(W_i)$

In each case, the conclusions of our causal analysis will depend on the mapping function used.

Text as Outcome

Text is a rich source of information about the opinions, views and responses of individuals.
Most instances so far in political science of people collecting large text datasets have been text as outcome
Also includes a long history of manual coding of open-ended survey responses and manual content analysis of documents.

Estimand: Text as Outcome

\[ \tau_i = E[Y_{1i} - Y_{0i}] = E[g(W_{1i}) - g(W_{0i})] \]

Intuition: We want to know how the potential outcome of the mapping function, $g()$, differs between treatment and control conditions.

What assumptions do we need in order to estimate this quantity?

Independence assumption: text as outcome

\[ g(W_{0i}), g(W_{1i}) \perp\!\!\!\perp D_i \]

\[ g(W_{0i}), g(W_{1i}) \perp\!\!\!\perp D_i|X_x \]

Intuition:

If $D$ is randomized (independent of the potential outcomes), then we can use the difference in means as an estimator of the average treatment effect.
If $D$ is as good as random, conditional on $X$, then we can estimate the average treatment effect by controlling for $X$

Text as Outcome

Without Randomization or Control: $X_i$ as a confounder

$D_i$ (Treatment) → affects $Y_i$ (Outcome)
$Y_i$ determines $W_i$ (Text)
$W_i$ (Raw text data) → mapped via $g(W_i)$ (a lower-dimensional text summary)
$X_i$ (Covariate) can influence both $D_i$ and $Y_i$, representing a potential confounder

With Randomization or Control: $X_i$ no longer confounds

What happens when $D_i$ is randomized or we control for $X_i$?
Does $X_i$ still influence the treatment?
What does this mean for confounding?

The edge $X_i \rightarrow D_i$ disappears, breaking the confounding pathway

Problems of Causal Inference with Text

The existence of the mapping function in any causal text analysis can create difficulties.

Problem one: Overfitting and Fishing

Typically, we look at the data in order to generate a good mapping function
- Topic models: what values for $K$?
- Naive Bayes: what feature selection decisions?
- Dictionaries: which words?
- Etc
This creates the potential for overfitting and fishing
- Overfitting (in the causal framework): “discovering” relationships between our treatment and outcome that do not generalise/replicate
- Fishing: Selecting $g()$ on the basis of the relationship between treatment and outcome
E.g. continuously “refine” a dictionary until the scores from that dictionary differed significantly between treatment and control units

Problem two: Violations of Causal Assumptions

Causal inference methods require assumptions to identify average causal effects
A key assumption – Stable Unit Treatment Value Assumption (SUTVA) – requires that the each unit’s potential outcomes are not affected by another unit’s treatment status
In many methods we study, the mapping function, $g()$ depends on the texts of all units. E.g.
- in a topic model, the topics are estimated from all documents;
- in Naive Bayes, word-coefficients come from treatment and control documents
- Etc
These represent SUTVA violations by design, because the measure for each unit will be dependent on all other units

Workflows for Making Causal Inferences with Text

Problem: simultaneously discovering $g()$ and estimating sources of variation (causes) in $g()$ can lead to erroneous conclusions.

Solution: Split the discovery and estimation of variation in $g()$ into separate parts of the research process

Three alternative approaches:

Define g() before looking at documents
- Before conducting any analysis, pre-commit to a specific measurement strategy
- Define dictionary terms; pre-specify number of topics, feature selection etc
- Normally in the form of a pre-analysis plan (PAP)
- Disadvantage: can be very hard to say, a priori, which decisions will produce the best measures
Run sequential studies
- Conduct one analysis in which $g()$ is developed and treatment effects are estimated
- Conduct (potentially many) subsequent analyses where $g()$ is fixed, but new data is collected
- Protects against overfitting/fishing and SUTVA violations
- Disadvantage: requires running multiple studies; “new” data is often not available in observational research
Use a train/test split
- Split the sample into two parts – one for discovering and refining $g()$, one for estimating causal effects
- Protects against overfitting/fishing and SUTVA violations
- Disadvantage: reduces statistical power, as only a subset of observations is used in each stage

Workflows for Making Causal Inferences with Text

Problem: simultaneously discovering $g()$ and estimating variation in $g()$ can lead to erroneous conclusions.

Solution: Split the discovery and estimation of variation in $g()$ into separate parts of the research process

Example I Egami et. al., 2018

Egami et al randomly assign survey respondents to read two hypothetical scenarios about an illegal immigrant

Respondents were then asked to write two sentences describing what actions the US government should take with respect to the individual

This data was analysed using a structural topic model (i.e. g())
- STM estimated on a subset of the data held out for training
- Topics then predicted for the test set

Once the topic model was fixed, they then estimated the topic proportions for the test set, and the effects of the treatment on those proportions

Treatment

A 28-year-old single man, a citizen of another country, was convicted of illegally entering the US. Prior to this offense, he had served two previous prison sentences each more than a year. One of these previous sentences was for a violent crime and he had been deported back to his home country.

Control

A 28-year-old single man, a citizen of another country, was convicted of illegally entering the US. Prior to this offense, he had never been imprisoned before.

Example II Hangartner et. at., 2021

Hangartner et. at. study the effects of empathy-based counterspeech on racist hate speech online

Randomly assign Twitter users who have sent messages containing xenophobic/racist hate speech to one of three counterspeech strategies or a control group

Combination of dictionaries and manual annotation to measure the xenophobic/racist content of users tweets in the future

Measurement procedure is pre-registered in a pre-analysis plan before the data collection takes place (here, g() includes an element of manual coding)

Break

Please fill in the midterm survey

Text as Treatment

Identification with Text as Treatment

Estimand: Text as Treatment

\[ E[Y_{1i} - Y_{0i}] \]

where

$Y_{1i}$ is the potential outcome for unit $i$ when the text assigned to unit $i$ has a value of $g(W_i) = 1$
$Y_{0i}$ is the potential outcome for unit $i$ when the text assigned to unit $i$ has a value of $g(W_i) = 0$

Intuition: We want to know how the potential outcomes for a unit differ between treatment and control conditions, where the treatment status is determined by the text.

Previously, we said that the difference in means is an unbiased estimator if the treatment is randomized.

Independence assumption: text as treatment

\[ Y_{0i},Y_{1i} \perp\!\!\!\perp g(W_i) \]

Here we also require that the output of the mapping function does not correlate with other features of the texts that might affect the outcome:

Sufficiency assumption: text as treatment

We have to assume either that

The measured treatment is uncorrelated with any unmeasured treatment, or
Unmeasured treatments have no effect on the outcome

Implication: Randomization of texts alone is insufficient to identify the causal effect of a latent treatment.

Text as Treatment

Uncontrolled: Other Text Features as Confounders

$W_i$ (Raw text data) is transformed using $g(W_i)$, which represents a measured text feature (e.g., sentiment, complexity, ideology).
The goal is to estimate the causal effect of $g(W_i)$ on $Y_i$ (Outcome).
Problem: There may be unmeasured text features $Z_i$, e.g., formality, readability, topic) that confound the relationship.

Controlling for Confounders

How can we solve this confounding issue?
Idea: Control for confounders in the analysis.
Solution: Include other text-derived features $Z_i$ in the model as controls.

Effectively breaks the confounding pathway

Example 1 (Grimmer and Fong ,2021)

Grimmer and Fong (2021) investigate which topics of Donald Trump tweets ($Y$) were most/least appealing to voters ($D$)
g() is a type of topic model (g()) estimated on 752 Tweets $\rightarrow D_i$, the topic of each tweet
Donald Trump tweets randomly allocated to online survey respondents who evaluate them (5-point scale, great to terrible)

Question: Does this experiment allow us to estimate the causal effect of the topics on tweet favourability?

Potential problem: If the topics correlate with other features of the text ($Z$), we cannot necessarily attribute observed differences in tweet favourability to the topics!

Example 1 (Grimmer and Fong ,2021)

Which other features might correlate with these topics?
- Sentiment?
- Other topics (religion?)
- Emotive language?
Whenever we have a latent treatment concept we have to assume that our measured treatment, $g(W_i)$ is uncorrelated with any other unmeasured treatment in the text
We can try and control for potentially confounding treatments, but we have to be able to work out what they are and how to measure them!

Example 1 (Grimmer and Fong ,2021)

Correlated Latent Treatments in Survey Experiments

Text is often used as a basis of treatments in social science experiments, though typically it is not treated as “data” in any systematic way
A very common form of experiment is a survey experiment in which some respondents are exposed to a treatment text while others are exposed to a control text
Texts are thought to differ in terms of some underlying concept of theoretical interest
This is really just a human-constructed mapping function! $g(W_i)$ is constructed by the researcher before the experiment such that
- $D_i = g(W_i) = 1$ if the researcher deems text $i$ to be representative of the concept of interest
- $D_i = g(W_i) = 0$ if the researcher deems text $i$ to be unrepresentative of the concept of interest
It remains possible that the treatment texts written by a researcher differ in multiple (unintended) ways

Example II (Blumenau and Lauderdale, 2022)

Let’s imagine that we are interested in assessing the effectiveness of different forms of political rhetoric:

Argument one is an example of an ad hominem attack
Argument two is an example of a moral argument
Are there any other differences between the two texts?
Yes! One is longer than the other; one includes more emotive language; etc

Key issue: the treatment we wish to test is latent, and we cannot directly manipulate latent properties of the text.

Example II (Blumenau and Lauderdale, 2022)

How might we estimate the effect of the latent concept of interest?

Write texts that differ only in terms of the latent concept
- Texts need to hold every other confounder constant
- This is hard!

Write multiple texts for each latent concept and marginalise over confounding features
- Assumes that confounding features won’t be correlated with treatment of interest
- But, with multiple texts we can control for observable confounders

In this paper, we use option two:

14 different rhetorical strategies
- Metaphor; ad hominem attacks; moral arguments; populism; etc

12 different policy issues
- Economic issues, social issues, etc
- Arugments both for and against

336 individual arguments

Example II (Blumenau and Lauderdale, 2022)

The estimate for any individual text might be confounded by any unmeasured latent treatment
- Typical survey experiments have only one treatment text $\rightarrow$ large chance of confounding

The average effect for a given rhetorical style could be confounded if an unmeasured latent treatment is correlated with the style across texts
- But we can try to control for confounders!

Example II (Blumenau and Lauderdale, 2022)

There is little evidence that text-based confounding is a problem in this application.

Text as Control

Uncontrolled: Unobserved Confounder

$D_i$ (Treatment) affects $Y_i$ (Outcome)
$Z_i$ represents an unmeasured confounder (e.g., )
- $Z_i$ affects both the treatment and the outcome, creating a spurious relationship
- We cannot directly control for $Z_i$ as it is unobserved!
$Z_i$ also affects some observed text data, creating the potential for control

Controlling for Text as a Proxy for $Z_i$

How do we remove the confounding bias?
Idea: Extract relevant text features $g(W_i)$ that capture the confounding aspect of $Z_i$
Solution: Include $g(W_i)$ as a control variable in the model.

Now the effect of $D_i$ on $Y_i$ is estimated without confounding bias

Identification with Text as Control

Definition: Independence assumption, text as confounder:

\[ Y_1,Y_0 \perp\!\!\!\perp D|g(W_i) \]

Intuition: When used as a control, we are assuming that once we condition on the text, $W_i$, via some low dimensional summary (g()), the potential outcomes are independent of our treatment.

Question: How do we “control” for the “content” of a text?

The “content” of a text doesn’t have a well-defined operationalization
- Should it be length?
- Should it be complexity?
- Should it be topical content?
- Should it just be every word?
The difficulty is that we do not know a priori which aspects of the text are related to both treatment and outcome
The choice of g() will likely be important and lead to different substantive answers
There is no statistical answer to this question! We need to think hard about the context and select the representation that we believe captures the relevant confounding concept

Text as Treatment: Example (Roberts et al., 2020)

Research question: Does female authorship reduce citations?

In political science, articles written by women receive fewer citations on average than articles written by men
Even when controlling for tenure, rank, university, and publication venue, the gender-citation gap persists
However, author gender is not randomly assigned $\rightarrow$ we cannot necessarily interpret this difference as causal
Why?
Women may write about different topics than men, and the textual content of the article might determine citations

$\rightarrow$ text might be a confounder of the relationship between gender and citations

Regression Adjustment for Text Confounding

The most common approach for adjusting potential confounders in the social sciences is to include measures for those confounding factors in a regression
- $Y_i = \beta_1 \cdot D_i + \epsilon_i$
- $Y_i = \beta_1 \cdot D_i + \sum_{k=2}^K \beta_k X_{ki} + \epsilon_i$
Given that for each document, our dfm records the number of times each word occurs within that document, can we just use our dfm for $X$?
The difficulty with this approach is that the dfm is very high-dimensional, normally with many more variables ($P$) than observations ($N$)
Standard regression techniques break down when $N<P$, meaning that we cannot simply include everything in an OLS model

Two Strategies:

Control for words in a penalized regression (e.g. Lasso regression; ridge regression)
Control for low-dimensional summary of texts, rather than for words directly (e.g. topic model)

Penalised Regression

Penalised regression estimators can select covariates that are highly predictive of an outcome by shrinking or eliminating covariates that are less important

In an OLS regression model, we minimize the sum of the squared errors:

\[\arg \min_\beta \sum_{i}^N \left( y_i - \alpha - \sum_{j=1}^J\beta_jx_{i,j}\right)\]

In a “Lasso” penalized regression, we minimize the sum of the squared errors plus an additional penalty for model complexity

\[\arg \min_\beta \sum_{i}^N \left( y_i - \alpha - \sum_{j=1}^J\beta_jx_{i,j}\right) + \color{red}{\lambda\sum_{j=1}^J\beta_j}\]

The second term is known as a “regularization” parameter, which forces many of the coefficients in the model to zero

One advantage of the Lasso is that it allows us to fit models where we have more covariates than we have observations (i.e. where $N < P$)

Penalised Regression Example

In this example, the authors regress the citations on…
- the gender of the authors ($D_i$, 1 if all female)
- the DFM counts of each article ($W_i$)
- a series of additional covariates, including publication venue, tenure, rank etc ($X_i$)
The Lasso model selects the words and covariates that are highly predictive of the outcome
Highly predictive words include
- Methodology terms (e.g. cause, effect, z-score)
- Substantive terms (e.g. democ, anticolon, surgenc)
- Important citations (e.g. weingast, shepsl)
Even controlling for all these words, the coefficient on the treatment variable is still negative
- $\rightarrow$ conditional on the words used, women receive fewer citations than men in political science

Topics As Controls

The Lasso model is excellent at selecting predictors of citations, but has some limitations:
- Not necessarily so good at finding confounders
- Doesn’t easily allow for interactions between words
- Deals poorly with highly-correlated predictors
An alternative is to first generate a low-dimensional summary (g()) of the texts, and then include the summary in a regression as a control
In this case, it makes sense to use a topic model: topical differences between articles are likely a/the source of confounding of the relationship between gender and citations
- Some political science topics will be more popular than others
- Female scholars may be more/less likely to write on certain topics than male scholars

Topics As Controls Example

Approach:

Estimate: $Y_i = \alpha + \beta_0 Female_i + \sum_{k=1}^K\beta_k\theta_{k,i}$

where $\theta_{k,i}$ is the proportion of document $i$ devoted to topic $k$ from a structural topic modell.

Findings:

Papers about “war;conflict;…”, “trade;intern;…”, “intern;polit;…” and “variabl;model;…” receive more citations, on average
Even controlling for all these topics, the coefficient on the treatment variable is still negative

Conclusion

Summing Up

Text as Outcome

Text as Treatment

Text as Control

The intersection of causal inference and quantitative text analysis is a new frontier in quantitative social science research
In each case, this requires mapping the high-dimensional texts to a lower-dimensional summary to be included in the analyses
The use of a mapping function raises a set of issues that we must consider carefully in order to make valid inferences

6: Causal Inference with Text

Midterm survey

Causal Inference Review

Knowledge and Causality

Causality and Text-as-Data

Potential Outcomes Notation

Potential Outcomes Notation

Difference in Means and Selection Bias

Difference in Means and Selection Bias

What Causes Selection Bias?

Causal Analysis Strategies

Mapping Function: \(g(W_i)\)

Mapping Function: \(g(W_i)\)

\(g(W_i)\) as Outcome, Treatment, or Control

Text as Outcome

Text as Outcome

Text as Outcome

Without Randomization or Control: \(X_i\) as a confounder

With Randomization or Control: \(X_i\) no longer confounds

Problems of Causal Inference with Text

Workflows for Making Causal Inferences with Text

Workflows for Making Causal Inferences with Text

Example I Egami et. al., 2018

Example II Hangartner et. at., 2021

Break

Text as Treatment

Identification with Text as Treatment

Text as Treatment

Uncontrolled: Other Text Features as Confounders

Controlling for Confounders

Example 1 (Grimmer and Fong ,2021)

Example 1 (Grimmer and Fong ,2021)

Example 1 (Grimmer and Fong ,2021)

Correlated Latent Treatments in Survey Experiments

Example II (Blumenau and Lauderdale, 2022)

Example II (Blumenau and Lauderdale, 2022)

Example II (Blumenau and Lauderdale, 2022)

Example II (Blumenau and Lauderdale, 2022)

Text as Control

Text as Control

Uncontrolled: Unobserved Confounder

Controlling for Text as a Proxy for \(Z_i\)

Identification with Text as Control

Text as Treatment: Example (Roberts et al., 2020)

Regression Adjustment for Text Confounding

Penalised Regression

Penalised Regression Example

Topics As Controls

Topics As Controls Example

Conclusion

Summing Up

Text as Outcome

Text as Treatment

Text as Control