Datathons

The datathon exercises are designed to help you develop necessary expertise for the final, summative assessment.

You will be working in small groups on these exercises. Your groups will be assigned randomly for each datathon exercise. You will have two weeks to complete each exercise and you will present your results to the class during the seminars.

The model for the exercise is that of a data hackathon. A data hackathon is an intensive exercise that asks researchers to do their best to turn information into knowledge. Data hackathons (datathons) use research questions and datasets to advance knowledge.

For each exercise you will frame a research question, create and implement a research design, mobilize data resources and present your findings in the form of R Presentation.

If you wish to refresh your knowledge of research design, you should read first four chapters of Kellstedt, Paul M. and Guy D. Whitten (2013) The Fundamentals of Political Science Research, 2nd edition, Cambridge University Press. This was part of summer pre-readings for the course. There are copies of the book in the library if needed.

R Presentation structure

You should carefully think what to include on the slides. The content shouldn’t be dense, hence you should include only the absolutely necessary material. You should follow the structure of the presentation below. You may go beyond eight slides but you cannot have more than eleven slides overall. You will have 10 minutes for presentation in class. You should also practice your presentation before class.

  • Slide 1: Names, Date, Module.
  • Slide 2: Introduction
    • What social/policy question was asked or challenge addressed? Why is this question important or the challenge critical?
    • What datasets were used?
    • What is the novel contribution?
    • What is the key methodology or methodologies used?
    • What are the key findings?
  • Slide 3: Previous Work
    • A couple relevant pieces of previous work on the topic of your research.
  • Slide 4: Theoretical Framework
    • List the hypotheses that you’re testing in this project. No more than two hypotheses.
  • Slide 5: Data description
    • You should provide summary statistics for your variables.
    • A visualization of main trends in your variables. For visualisation you can use ggplot.
  • Slide 6: Methodology
    • This slide serves the purpose of detailing how you will go about testing the hypotheses laid out above.
    • Describe methods you use to test the relationship, operationalization of your variables (outcome variable, key predictors, and controls). If you’re developing a prediction model, discuss how your prediction address hypotheses above.
  • Slide 7: Results
    • Detail findings from most important to least. Begin, therefore, by discussing the results as they pertain to your hypotheses – can you reject your hypotheses? How your results (e.g. from prediction model) can be validated?
    • You should present the findings in summary form – a table of results or a graph. Use texreg (or stargazer, apsrtable or equivalent) to present results in tabular form. You can also use coefplot style plots (or any other visualisation of results).
  • Slide 8: Conclusion and Discussion
    • Provide an interesting overview of the study. What have we learned?
    • Next, discuss the limitations of your study, also what could be done to improve/build upon the study. You can demonstrate some self-criticism.
    • Finally and perhaps most importantly, discuss policy implications of your study.

You can create a presentation in RStudio using the File > New File > R Presention menu. You can also download a presentation template below to see what the presentation should look like.

Submission

  • Save your presentation as HTML and upload to Datathon Assignments folder on Moodle before 2pm on the day that it’s due.
  • Load your .Rpres and .html files on a USB flash drive and bring to class.

What we are looking for

A strong project will have the following components:

  • Follow the structure of the project outlined above, and consist of maximum 11 slides.
  • Employ the dataset specified.
  • Include a high quality visualization.
  • Develop an empirical model that would allow to test the stated hypotheses.
  • Generate an interesting empirical finding.
  • In addition, a strong presentation should be well-written and provide some level of creativity in its use of or combination of data. Slides shouldn’t be dense with text.