- Much of these materials were adapted from those produced by Software Carpentry. Thank you!
In this unit, we'll use python to turn a bunch of loose text documents into a real-life database. (Note: This database was created for a project by R. Terman and E. Voeten, and was processed using much the same process as you'll be learning here.)
The lecture and problem set will leverage your new python skills, especially working with text, lists, and dictionaries; writing for-loops, conditional statements, and functions; and "thinking" like a programmer.
About the Data
We'll be creating a database from Universal Period Review outcome reports.
The Universal Periodic Review (UPR) is a process run by the United Nations Human Rights Council, which involves a periodic review of the human rights records of all 193 UN Member States.
Reviews take place through an interactive discussion between the State under review and other UN Member States. During this discussion any UN Member State can pose questions, comments and/or make recommendations to the States under review. States under review can then respond, stating which recommendations they reject, accept, will consider, etc. Reports are then drawn up detailing this discussion.
We will be analyzing outcome reports from the 2014 Universal Period Reviews of 42 countries, which we retrieved here and formatted as text documents.
The goal is to convert these semi-structured texts to a tabular dataset of recommendations with the following variables:
- Text of recommendation (text)
- Country to which the recommendation is directed (to)
- Country that is making the recommendation (from)
- The year when the review took place (year)
- The response to the recommendation, i.e. whether the reviewed country rejects, accepts, etc (decision)
In other words, we want to turn this:
into this: