Data Engineering Interview Assignment | Wave HQ

2021-09-10

Instructions

ETL script is written in Python
- Python libraries include Pandas, PySpark, Requests, Glob, and SQLAlchemy

The ETL process comprises of the following steps:

Follow the steps below to test the ETL process using sample JSON data files.

Install Python libraries
Open a terminal window and cd to the 'pipeline' folder that contains the etl.py and query.py files

cd c:/usr/documents/Project/pipeline
Run the elt.py script to extract data from API and load into datalake

python etl.py
Run the query.py script to extract data from datalake and load into SQLite to run queries

python query.py

Please see the "output.txt" file for an example of the console log of the pipeline after a test run.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
pipeline		pipeline
.gitattributes		.gitattributes
.gitignore		.gitignore
Data Ops - Code Challenge (2021) - Instructions.docx		Data Ops - Code Challenge (2021) - Instructions.docx
Data Ops - Code Challenge (2021) - Instructions.pdf		Data Ops - Code Challenge (2021) - Instructions.pdf
README.md		README.md
output.txt		output.txt