MIT OpenCourseWare Crawler

Crawl Output

Last updated: November 27, 2023

OCW Video Lectures: results.csv

Description

This is a simple crawler to save the available courses on MIT OpenCourseWare. This crawler will export the courses with video lectures as a CSV file.

You can crawl for courses other than video lectures by changing the @start_urls in crawler.rb.

Docker Run (Recommended)

This is the simplest way to run the crawler. It will run the crawler and save the results in results.csv using a Docker volume.

$ docker build -t ocw-crawl:1.0 .
$ docker run --volume $(pwd)/results.csv:/app/results.csv \
             --rm \
             --name ocw-crawl \
             ocw-crawl:1.0

Manually Run

To run the crawler without Docker, you'll need to install an older version of Ruby that's compatible with kimurai. You'll also need geckodriver and Firefox. Read more about setting up kimurai here if you run into trouble.

Setup

Install Ruby 2.5.0 and run bundle install.

$ asdf install ruby 2.5.0
$ asdf global ruby 2.5.0
$ gem install bundler
$ bundle install # install dependencies

Run

$ ruby crawler.rb
...

Possible Improvements

Use OCW Sitemaps to crawl all courses
Get more information about each course from the sitemap
- Course materials often follow these patterns:
  - Syllabus: /pages/syllabus/
  - Course download: /download/
  - Resources: /resources/*/
    - PDFs, slides, lectures notes, etc.
  - Course pages: /pages/*/
    - Readings: /pages/readings/
Turn the data into an app or API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MIT OpenCourseWare Crawler

Crawl Output

Description

Docker Run (Recommended)

Manually Run

Setup

Run

Possible Improvements

Files

README.md

Latest commit

History

README.md

File metadata and controls

MIT OpenCourseWare Crawler

Crawl Output

Description

Docker Run (Recommended)

Manually Run

Setup

Run

Possible Improvements