This tutorial is based on the content in the GESIS fall seminar Automated Web Data Collection with Python in 2023 and has two parts. In the first part we discuss the use of Web API as data source and use the MediaWiki API which powers Wikipedia as an example. In the second part we discuss how to collect data from static web pages with Python. There are lecture units and corresponding exercises with solutions for each part.
-
Part 1 - Wikipedia
-
Part 2 - Static web scraping