10  Developing a text-analysis research project

10.1 Develop your research project

In today’s seminar we will use two datasets to learn how to develop a research project. Those datasets come from the New York Times, and includes article published in April 2018, as well as associated comments. The comment section in the articles is very active and it gives a glimpse of readers’ take on the matters concerning the articles. Both datasets provide a wealth of additional metadata.

Starting from the data, you are expected to draft a research proposal, covering your research question, the key concepts, the text-as-data method, and the empirical analysis. There is no single correct “solution”. The main goal of today’s tutorial is to work on how to translate a research question in the appropriate empirical analysis.

You can download your data from this link

10.2 Data

Start by looking at the data. The first dataset includes articles published in the NYT in April 2018. The most interesting text information on the article is reported in the headline and the snippet variables. However, the dataset includes other interesting metadata that can be used to define a research question.

The second dataset includes comments published in response to the articles above. The articleID variable identifies the article. The most interesting text information is reported in commentBody. Here again, the dataset includes other interesting metadata that can be used to define a research question.

                                              |

10.2.1 Tasks

  1. Formulate a research question that can be answered using this data.

  2. Identify the key concepts that you need to measure in order to answer to your research question.

  3. Choose a measurement strategy that we have covered at some point on this course to represent those concepts. You can choose any method we have studied – dictionaries, topic models, supervised learning, etc – but your choice should be informed by the substantive case we are working with.

  4. Choose your econometric specification. What is the relationship that you want to model?

  5. Evaluate possible shortcomings of your approach.

  6. Once you have a full proposal, start with the implementation!

10.3 Homework: Continue with the exercise

Complete the exercise started in class, and upload your answers, code and results on this Moodle page.