This repository contains the code used to sync the data from our index of open data on Socrata into a Seafowl instance.
This data will power the SocFeed app in the future.
In the meantime, see the Observable notebook that showcases this dataset.
- Every night (currently on-demand), we initiate a download of the new snapshots of Socrata's Discovery API from Splitgraph in the Parquet format
- This gives us a pre-signed S3 URL to download the file
- We use
CREATE EXTERNAL TABLE
on Seafowl with this URL to append this data to a history table (bypassing having to download this file from the GitHub Actions instance) - Then, we use a not dbt script that creates some derived tables (monthly/weekly/daily summary) used by the SocFeed app (actual dbt support coming soon!)