state-fin-ingest

The goal of this package is to provide an easy way for researchers/analysts/data scientists to ingest finance data from state legislature elections in a repeatable and scalable way. Contributions and reports are loaded via state-specific "ingestors" which standardize and clean-up the raw state-provided data. The contributions and reports are then flushed to an Elasticsearch cluster where further analysis can be performed (Note: in the future, data could potentially be flushed to alternate data sources).

At the moment, only a few state ingestors are available but the goal is to be able to load contribution data for all 50 state legislatures.

Getting started

To get started, you need to have a .env file in the root directory with the ES_HOST variable that has the full path to the Elasticsearch cluster (including authentication).

You'll also need poetry installed on your local machine.

Running poetry install will install the dependencies.

Before being able to run the ingestors, you will likely have to download the finance data for the states you are interested in. The directory for each state ingestor contains a README.md that describes the files required for that state's ingestion

To begin ingestion, run poetry run python main.py

Gotchas / Known Issues / Caveats

We're only ingesting contributions + reports that are linked to a candidate, candidate committee, or committee that's registered to support a candidate
We're not ingesting data around expenditures or other campaign finance data, only contributions and reports
We're only ingesting data for state legislatures

About the ingestion process

Before an ingestor's state specific ingestion work is performed, the main ingest process will create a temporary directory for the work. The ingest process will also create a new index in the Elasticsearch cluster where the name is prefixed by the state code specified by the ingestor and is suffixed by a datetime string. (TODO: we should also validate files required by the ingestor exist before beginning the ingest process)

Next, the ingestor's pre() function is called which performs any type of setup required to start. This could include building any necessary lookup dicitionaries, download more files, etc.

After the pre() work is complete, the ingestor's work() generator is called which returns records until no more records are left to consume. After a set number of records, the records will be flushed to the working Elasticsearch index created during setup.

After records have been consumed, the ingestor calls its post() function to do any spin down.

The ingest process will then cleanup the temporary directory that has been created and promotes the working index to be the main index for the state. This is achieved by removing the old index from the state index alias and adding the new index to the alias. This allows for a pretty seamless transition from the old data to the new data.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
state_fin_ingest		state_fin_ingest
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
flow.png		flow.png
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

state-fin-ingest

Getting started

Gotchas / Known Issues / Caveats

About the ingestion process

About

Releases

Packages

Languages

License

poffdeluxe/state-fin-ingest

Folders and files

Latest commit

History

Repository files navigation

state-fin-ingest

Getting started

Gotchas / Known Issues / Caveats

About the ingestion process

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages