JobCrawler - Scrapy Web Crawler

JobCrawler is a full ETL pipeline that collects job listing information from Indeed.com, parses, cleans and transforms the results and sends them to an API for further data validation and storage.

This project was originally intended to collect information regarding job listings, companies and their posting habits for use in a web application (JobStat) as a live dashboard and research tool for job seekers.

At this time, this project only collects information regarding Data Analytics and Software Development (Python) job listings.

Tech Stack

Python 3.10
- Scrapy
Bash
- Shell scripts for database backups to Amazon S3
AWS
- EC2 instance as a deployment server
- S3 for db backup storage

Deployment

To Dos

Implement CI/CD
Finalize documentation
Convert data store to send to API
3a. [ ] Send JSON to API
3b. [ ] Handle server responses appropriately

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

JobCrawler - Scrapy Web Crawler

Tech Stack

Deployment

To Dos

Files

README.md

Latest commit

History

README.md

File metadata and controls

JobCrawler - Scrapy Web Crawler

Tech Stack

Deployment

To Dos