Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize CVR identification across audit logs and RCTab output #913

Draft
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

nurse-the-code
Copy link
Collaborator

@nurse-the-code nurse-the-code commented Jan 9, 2025

Key Changes:

  • Fixes getId method to use logic that prints "RCTab CVR Id" to audit log.
    • Uses computedId as primary identifier with fallback to suppliedId
  • Added getSuppliedId method to CVR to print computeId to rctab_cvr.csv as `"Vendor Id"
  • Updates both audit logs and rctab_cvr.csv to use the getId method
  • Changes Dominion computedId delimiter from pipe (|) to dash (-) for improved readability
  • Adds column "RCTab CVR Id" to RCTab CVR output

Benefits:

  • Same ballot maintains the same identifier across multiple tabulations by using centralized getId logic
  • Easier to track specific ballots across audit logs and RCTab CVR output
  • More intuitive ID format using dashes instead of pipes
  • Moving toward consistent identification approach throughout the application

Known Issues:

  • See TODOs in code asking if we can to use the updated getId method, the getComputedId, or something else.

Testing Needed:

  1. Verify ID consistency:
    • ✅ Compare IDs in audit log with rctab_cvr.csv output
    • ✅ Ensure id returned from getId method is consistent across multiple tabulations
    • ✅ Check ID format follows tabulator-batch-record pattern for records from Dominion CVR sources
  2. Test fallback behavior:
    • ✅ Cases where computedId is null
    • ✅ Cases where computedId is empty or whitespace
    • ✅ Cases where only suppliedId is available
  3. Test with each CVR provider
    • ✅ CDF/Unisyn
    • ✅ Clear Ballot
    • ✅ Dominion (tested with 2024 Alaska US House race)
    • ✅ ES&S (in test suite)
    • ✅ Hart
    • ✅ generic CSV
  4. Review and update failing tests:
    • ✅ Done for now

Feature Request: Add Standardized Unique Identifiers for Cast Vote Records
#911
Replace UUID-based identification with a more consistent ID system that
uses the same identifier across audit logs and RCTab CVR output. The ID
is primarily derived from the computedId (tabulator-batch-record) with
fallback to suppliedId when computedId is unavailable.

Unlike UUIDs which change between tabulations, this approach ensures the
same ballot maintains the same identifier across multiple tabulations,
making it easier to track specific ballots across different runs.

Changes:
- Remove UUID generation for CVR identification
- Add getPrimaryId() method to centralize ID logic
- Update audit log and RCTab CVR to use the same ID format
- Change ID delimiter from pipe to dash for better readability
- Add TODO to review legacy getId() usage

Note: This change ensures that CVR IDs are consistent between audit logs,
RCTab CVR output, and repeated tabulations, making it easier to track
individual ballots across different output formats and multiple runs.
Comment on lines 132 to 134
String getPrimaryId() {
return !isNullOrBlank(computedId) ? computedId : suppliedId;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a refactoring of the computedId and suppliedId logic used below in the logStringBuilder (which should print a unique CVR Id to the audit log.

@@ -981,6 +984,7 @@ private Map<String, Object> generateCvrSnapshotMap(
entry("@type", "CVR.CVRContest"));

return Map.ofEntries(
// TODO: Do we want this to use the getSuppliedId or the getId method?
entry("@id", generateCvrSnapshotId(sanitizeStringForOutput(cvr.getId()), round)),
Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our choice here should be the same as in the TODO comment added before the sanitizedId definition, so long as we are offering a unique CVR id to include in @id.

@@ -875,6 +877,7 @@ private List<Map<String, Object>> generateCdfMapForCvrs(List<CastVoteRecord> cas
for (CastVoteRecord cvr : castVoteRecords) {
List<Map<String, Object>> cvrSnapshots = new LinkedList<>();
cvrSnapshots.add(generateCvrSnapshotMap(cvr, null, null));
// TODO: Do we want this to use the getSuppliedId or the getId method?
Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In researching this question, I looked at the NIST Cast Vote Records Common Data Format Specification v 1.0 document. Here are some relevant parts I found.

  • on page 18:
    Screenshot 2025-01-22 at 05 52 02
  • on page 31:
    Screenshot 2025-01-22 at 05 53 46

This prints out the BallotPrePrintedId from the CDF's CVR class. It would seem to me that this should be a vendor-supplied id (e.g. suppliedId). I am not sure that we were using this correctly prior to this PR, because we would sometimes print the computedId (which was never pre-printed on any ballot). I am also not sure what this should look like if there is no vender-supplied id.

Comment on lines -1 to +11
Source Filepath,CVR Provider,Contest Id,Tabulator Id,Batch Id,Record Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-1,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-2,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-3,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-4,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-5,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-6,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-7,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-8,,,George Gervin,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-9,,,George Gervin,Yinka Dare,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-10,,,Sedale Threatt,George Gervin,undervote
Source Filepath,CVR Provider,Contest Id,RCTab CVR Id,Tabulator Id,Batch Id,Vendor Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-1,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-2,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-3,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-4,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-5,,,,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-6,,,,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-7,,,,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-8,,,,,,George Gervin,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-9,,,,,,George Gervin,Yinka Dare,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-10,,,,,,Sedale Threatt,George Gervin,undervote
Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ES&S CVRs, it looks like our new getSuppliedId method (slightly changed from the original getId method results in no "Vendor Id" (formerly "Record Id") getting printed to the rctab_cvr.csv. Instead, that same value is printed as the "RCTab CVR Id" (using the new getId method).

This is because the old getId method (and the new getId method as well) returns a computedId for ES&S CVRs. The new getSuppliedId returns an empty string, because ES&S CVRs don't have suppliedId associated with them in RCTab.

@@ -696,14 +696,15 @@ String writeRcTabCvrCsv(
CSVFormat format = CSVFormat.DEFAULT.builder().setNullString("").build();
csvPrinter = new CSVPrinter(writer, format);
// print header:
// ContestId, TabulatorId, BatchId, RecordId, Precinct, Precinct Portion, rank 1 selection,
// rank 2 selection, ... rank maxRanks selection
// RCTab CVR Id, ContestId, TabulatorId, BatchId, RecordId, Precinct, Precinct Portion,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorder the variables in this comment

null. Null handling is now handled where getSuppliedId is called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add common id to both audit log and rctab_cvr.csv
2 participants