All data necessary to generate the reports gets loaded as part of the data pipeline, except for the SafeGraph data. To prepare the SafeGraph data:
- Download the SafeGraph Patterns data from https://shop.safegraph.com/?countries=US&states=PA&cities=Philadelphia&poi=ALL&tab=datasets. You should end up with a zip file that has a bunch of other zip files inside of it named something like
PA-CORE_POI-PATTERNS-YYYY_MM.zip
. - Upload that file to a
safegraph_patterns/
folder inside of your data bucket on Google Cloud Storage.
-
Build the docker image -- Navigate to the airflow/ folder and run:
docker build . -f Dockerfile --tag final-project-airflow:0.0.1
-
Set up your environment -- In the docker-compose.yml file, under
x-airflow-common
->volumes
, change the value for/opt/google-app-creds.json
to the path to an appropriate service account key. -
Run Airflow -- From the airflow/ folder, first run
docker-compose up airflow_init
, then rundocker-compose up
.