-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize CVR identification across audit logs and RCTab output #913
base: develop
Are you sure you want to change the base?
Conversation
Feature Request: Add Standardized Unique Identifiers for Cast Vote Records #911
Replace UUID-based identification with a more consistent ID system that uses the same identifier across audit logs and RCTab CVR output. The ID is primarily derived from the computedId (tabulator-batch-record) with fallback to suppliedId when computedId is unavailable. Unlike UUIDs which change between tabulations, this approach ensures the same ballot maintains the same identifier across multiple tabulations, making it easier to track specific ballots across different runs. Changes: - Remove UUID generation for CVR identification - Add getPrimaryId() method to centralize ID logic - Update audit log and RCTab CVR to use the same ID format - Change ID delimiter from pipe to dash for better readability - Add TODO to review legacy getId() usage Note: This change ensures that CVR IDs are consistent between audit logs, RCTab CVR output, and repeated tabulations, making it easier to track individual ballots across different output formats and multiple runs.
String getPrimaryId() { | ||
return !isNullOrBlank(computedId) ? computedId : suppliedId; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a refactoring of the computedId
and suppliedId
logic used below in the logStringBuilder
(which should print a unique CVR Id to the audit log.
…ture/issue-911-rctab-cvr-primary-id
@@ -981,6 +984,7 @@ private Map<String, Object> generateCvrSnapshotMap( | |||
entry("@type", "CVR.CVRContest")); | |||
|
|||
return Map.ofEntries( | |||
// TODO: Do we want this to use the getSuppliedId or the getId method? | |||
entry("@id", generateCvrSnapshotId(sanitizeStringForOutput(cvr.getId()), round)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our choice here should be the same as in the TODO
comment added before the sanitizedId
definition, so long as we are offering a unique CVR id to include in @id
.
@@ -875,6 +877,7 @@ private List<Map<String, Object>> generateCdfMapForCvrs(List<CastVoteRecord> cas | |||
for (CastVoteRecord cvr : castVoteRecords) { | |||
List<Map<String, Object>> cvrSnapshots = new LinkedList<>(); | |||
cvrSnapshots.add(generateCvrSnapshotMap(cvr, null, null)); | |||
// TODO: Do we want this to use the getSuppliedId or the getId method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In researching this question, I looked at the NIST Cast Vote Records Common Data Format Specification v 1.0 document. Here are some relevant parts I found.
This prints out the BallotPrePrintedId
from the CDF's CVR class. It would seem to me that this should be a vendor-supplied id (e.g. suppliedId
). I am not sure that we were using this correctly prior to this PR, because we would sometimes print the computedId
(which was never pre-printed on any ballot). I am also not sure what this should look like if there is no vender-supplied id.
Source Filepath,CVR Provider,Contest Id,Tabulator Id,Batch Id,Record Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3 | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-1,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-2,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-3,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-4,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-5,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-6,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-7,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-8,,,George Gervin,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-9,,,George Gervin,Yinka Dare,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-10,,,Sedale Threatt,George Gervin,undervote | ||
Source Filepath,CVR Provider,Contest Id,RCTab CVR Id,Tabulator Id,Batch Id,Vendor Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3 | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-1,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-2,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-3,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-4,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-5,,,,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-6,,,,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-7,,,,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-8,,,,,,George Gervin,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-9,,,,,,George Gervin,Yinka Dare,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-10,,,,,,Sedale Threatt,George Gervin,undervote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ES&S CVRs, it looks like our new getSuppliedId
method (slightly changed from the original getId
method results in no "Vendor Id"
(formerly "Record Id"
) getting printed to the rctab_cvr.csv
. Instead, that same value is printed as the "RCTab CVR Id"
(using the new getId
method).
This is because the old getId
method (and the new getId
method as well) returns a computedId
for ES&S CVRs. The new getSuppliedId
returns an empty string, because ES&S CVRs don't have suppliedId
associated with them in RCTab.
…ture/issue-911-rctab-cvr-primary-id
@@ -696,14 +696,15 @@ String writeRcTabCvrCsv( | |||
CSVFormat format = CSVFormat.DEFAULT.builder().setNullString("").build(); | |||
csvPrinter = new CSVPrinter(writer, format); | |||
// print header: | |||
// ContestId, TabulatorId, BatchId, RecordId, Precinct, Precinct Portion, rank 1 selection, | |||
// rank 2 selection, ... rank maxRanks selection | |||
// RCTab CVR Id, ContestId, TabulatorId, BatchId, RecordId, Precinct, Precinct Portion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reorder the variables in this comment
null. Null handling is now handled where getSuppliedId is called.
Key Changes:
getId
method to use logic that prints"RCTab CVR Id"
to audit log.computedId
as primary identifier with fallback to suppliedIdgetSuppliedId
method to CVR to printcomputeId
torctab_cvr.csv
as `"Vendor Id"rctab_cvr.csv
to use thegetId
methodcomputedId
delimiter from pipe (|
) to dash (-
) for improved readability"RCTab CVR Id"
to RCTab CVR outputBenefits:
getId
logicKnown Issues:
getId
method, thegetComputedId
, or something else.Testing Needed:
rctab_cvr.csv
outputgetId
method is consistent across multiple tabulations