Skip to content

Sync the Splitgraph Socrata dataset catalog history into Seafowl

Notifications You must be signed in to change notification settings

splitgraph/socrata-to-seafowl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Socrata-to-Seafowl sync job

This repository contains the code used to sync the data from our index of open data on Socrata into a Seafowl instance.

This data will power the SocFeed app in the future.

In the meantime, see the Observable notebook that showcases this dataset.

How it works

  • Every night (currently on-demand), we initiate a download of the new snapshots of Socrata's Discovery API from Splitgraph in the Parquet format
  • This gives us a pre-signed S3 URL to download the file
  • We use CREATE EXTERNAL TABLE on Seafowl with this URL to append this data to a history table (bypassing having to download this file from the GitHub Actions instance)
  • Then, we use a not dbt script that creates some derived tables (monthly/weekly/daily summary) used by the SocFeed app (actual dbt support coming soon!)

About

Sync the Splitgraph Socrata dataset catalog history into Seafowl

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published