9 Language Models (III): Neural Networks, Transfer Learning and Transformer Models – PUBL0099: Quantitative Text Analysis for Social Science

9.1 Lecture slides

9.2 Getting Started With Transformers

This seminar will introduce you to transformer models, as implemented in Python using the transformers library from HuggingFace. At present, this is by far the most straightforward way of using transformer models - there is no good alternative in R. A key reason for this is that HuggingFace, the repository that contains transformer models and their corresponding tokenisers, is designed to integrate with Python. HuggingFace provides a hub and associated library which makes it relatively easy to retrieve pretrained models, fine-tune them, and share them for future researchers to modify further. All of this means we will need to work in Python, but don’t worry, we do not expect you to write any original Python code. All the code needed to perform inference and fine-tune a model has been provided for you. We also do not expect you to understand this code for the assessment - these materials are intended as preliminary introduction and a resource that you could come back to, if ever you wanted to make use of transformers in future research.

A key challenge in getting started with Python is the setup. Unlike R and R Studio, there is no one recommended interface - instead, there are many options including Jupyter Notebook, Spyder, VS Code, and PyCharm, each suited to different workflows. Setting these up on your machine can be tricky, not least because of the risk of package dependency conflicts, more common in Python than R. To avoid these problems, we will be using Google Colab. Google Colab provides a virtual, temporary environment for running Python that requires no setup. All that is required to use it is a Google account. To open the seminar materials in Google Colab, simply click the link below.

Open Seminar in Google Colab

9.3 Setting up Python on your machine (optional)

If you would like to setup Python on your machine, there are some useful links below. Note that it is recommended to use virtual environments to avoid package dependency conflicts. ‘conda’ (part of ‘Anaconda’) is one popular system for doing this (see link 1), but you can also download Python (link 2) and setup virtual environments (link 3) without conda, using venv or virtualenv. If you are transitioning to Python from R (which is likely) then ‘Spyder’ (link 4) is the closest thing to R Studio - this comes as part of Anaconda, but can also be downloaded and installed without conda. Alternatively, you can run Python within an R session using the reticulate package (link 5).