Skip to content

A web scraping utility that collects information from Indeed surrounding business job posting habits.

License

Notifications You must be signed in to change notification settings

DResthal/JobCrawler

Repository files navigation

JobCrawler - Scrapy Web Crawler

JobCrawler is a full ETL pipeline that collects job listing information from Indeed.com, parses, cleans and transforms the results and sends them to an API for further data validation and storage.

This project was originally intended to collect information regarding job listings, companies and their posting habits for use in a web application (JobStat) as a live dashboard and research tool for job seekers.

At this time, this project only collects information regarding Data Analytics and Software Development (Python) job listings.

Tech Stack

  • Python 3.10
    • Scrapy
  • Bash
    • Shell scripts for database backups to Amazon S3
  • AWS
    • EC2 instance as a deployment server
    • S3 for db backup storage

Deployment


To Dos

  1. Implement CI/CD
  2. Finalize documentation
  3. Convert data store to send to API
    3a. [ ] Send JSON to API
    3b. [ ] Handle server responses appropriately

About

A web scraping utility that collects information from Indeed surrounding business job posting habits.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published