This project aligns with the TodoFEC initiative to create a standardized set of data tasks for comparing data processing frameworks. We parse and store the data in Parquet files, then upload to a public S3 bucket, so everyone access the data easily
The FEC data for this project is available as Parquet files in an S3 bucket, allowing direct querying without downloading. You can use DuckDB to query the data directly.
- Install duckdb
pip install duckdb
- Open duckdb
duckdb
- Run a Query: Use the following command to query the Parquet file directly from S3
select count(*) from read_parquet('s3://datarecce-todofec/pac_summary_2024.parquet');
Here are the S3 URIs of available dataset:
s3://datarecce-todofec/all_candidates_2024.parquet
s3://datarecce-todofec/candidate_master_2024.parquet
s3://datarecce-todofec/candidate_committee_linkage_2024.parquet
s3://datarecce-todofec/house_senate_2024.parquet
s3://datarecce-todofec/committee_master_2024.parquet
s3://datarecce-todofec/pac_summary_2024.parquet
s3://datarecce-todofec/contributions_from_committees_to_candidates_2024.parquet
s3://datarecce-todofec/operating_expenditures_2024.parquet
Before you begin you'll need the following on your system:
Install the python dependencies
poetry install
Once installation has completed you can start parsing data.
poetry run python main.py
tree --du -h datarecce-todofec/
[804M] datarecce-todofec/
├── [354M] parquet
│ ├── [173K] all_candidates_2020.parquet
│ ├── [164K] all_candidates_2024.parquet
│ ├── [ 86K] candidate_committee_linkage_2024.parquet
│ ├── [330K] candidate_master_2024.parquet
│ ├── [885K] committee_master_2024.parquet
│ ├── [ 21M] contributions_from_committees_to_candidates_2020.parquet
│ ├── [ 14M] contributions_from_committees_to_candidates_2024.parquet
│ ├── [118K] house_senate_2024.parquet
│ ├── [ 36M] operating_expenditures_2024.parquet
│ ├── [449K] pac_summary_2024.parquet
│ └── [281M] transactions_between_committees_2024.parquet
└── [450M] raw
└── [450M] bulk-downloads
├── [ 28M] 2020
│ ├── [ 28M] pas220.zip
│ └── [179K] weball20.zip
└── [422M] 2024
├── [ 91K] ccl24.zip
├── [855K] cm24.zip
├── [343K] cn24.zip
├── [ 45M] oppexp24.zip
├── [356M] oth24.zip
├── [ 19M] pas224.zip
├── [169K] weball24.zip
├── [448K] webk24.zip
└── [119K] webl24.zip