dc-council-vote-track

The Council of the District of Columbia hosts a Legislative Information Management System (LIMS) which provides access to information about the legislative activities of the council via API. However, the API has a few deficiencies:

The API structure makes it challenging to output a flat file of vote records. This script parses the API to create a flat csv file that contains most information for each vote including the vote records of individual council members.
Some votes are recorded in PDF files, and are not recorded in the API. This script automatically downloads those PDF files and uses OCR and image analysis to parse the PDF files and decode available vote records.

Usage

Example

python3 dc_council.py <token> 23 1

Output

Example Output: Output CSV File

Details

python3 dc_council.py <token> <councilPeriodId> <legislationType>

token A LIMS developer access token is required. Get the token here
councilPeriodId Council Period ID, https://lims.dccouncil.us/api/help/index.html#!/PublicData/GetCouncilPeriods, default=24 (2021-2022) Periods 20-24 have been tested
legislationType Legislation Type, see https://lims.dccouncil.us/api/help/index.html#!/PublicData/GetLegislationCategories, default=1 (Bill) Type 1 (Bill) has been tested

Design

Since LIMS imposes rate limiting and PDF processing is slow, we use a basic CSV file to keep track of progress when processing data about each individual bill. In the event of an issue during a script run, the script can be restarted and will pick up where it left off.

Data about each bill is stored locally in a pickle file in the data/legislationType_councilPeriodID/ directory.

After data analysis is complete for each bill, the data about each bill is re-loaded and output as a single csv file.

PDF Processing

PDF processing was designed specifically around amendment votes in the Committee of the Whole in council period 23 (e.g. B23-0760) where vote tallies were recorded in PDF format only. Multiple votes can be tallied per PDF, so we scan every page for potential votes.

Convert the PDF into a series of images
Analyze each page using the Tesseract Optical Character Recognition (OCR) library for a string that is present on pages that record votes.
Use OCR to read council member names and identify the location of dots that will indicate different vote outcomes (Yes, No, Present, Absent)
Read the color of the pixels at those target locations, determine if the location is blank or has a blue dot indicating the vote outcome

See an example image below - red pixels were analzyed to determine the vote outcome.

LIMS Information

https://lims.dccouncil.us/
https://lims.dccouncil.us/api/help/index.html (API Info)
https://lims.dccouncil.us/developerRegistration (Developer Authorization)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
dc_council.py		dc_council.py
outputListOfVotes_1_23.csv		outputListOfVotes_1_23.csv
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dc-council-vote-track

Usage

Example

Output

Details

Design

PDF Processing

LIMS Information

About

Releases

Packages

Languages

bberg/dc-council-vote-track

Folders and files

Latest commit

History

Repository files navigation

dc-council-vote-track

Usage

Example

Output

Details

Design

PDF Processing

LIMS Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages