Using ARKs for version control, initial draft proposal #887

emmetaobrien · 2024-04-12T17:05:14Z

Intended functionality:

a dataset version changes when the data provider specifies it should.
a new ARK is generated when dataset version changes.
the ARKs of previous versions link to the appropriate GitHub commit.

System needs to support the following use cases:

A) Dataset is updated with new scientific content.
B) Technical updates (e.g. extra properties added to DATS.json for CONP internal purposes) that make no difference to scientific content of dataset.
C) Removal of data from all versions of dataset (e.g. patients withdrawing from a study).

Proposed implementation:

Define somewhere to store archived ARKs for each dataset. Could be something like extraProperties=>archivedVersions in the DATS.json, an additional config file, or some other location. Subsequent text refers to this as archivedVersions, this is a placeholder.

Handling case A:

This function is triggered when a user changes an existing value in the version field. (I believe population of the version field on initial data submission can remain as currently implemented; some CONP datasets refer to concluded projects and no further updates are envisioned.)

workflow:

prepare archival data for previous version of dataset (version label, ARK, relevant github commit, date?, potential other information??)
check for existence of archivedVersions for this dataset
if NO, initialise archivedVersions and write archival data as first entry
if YES, add archival data as another entry in archivedVersions
save DATS.json with updated version number (as currently happens)

interface changes:

if archivedVersions exists for dataset, display one line for each entry containing the archival data. (Decide where in the dataset page to show this, and how.)

Case B will be carried out by CONP developers on a case-by-case basis and is generally not envisioned as updating versioning at this time.

Case C requires retroactive adjustments to all versions of a dataset that we serve. @samirdas suggested that removing the data from the underlying dataset sufficed, though this will leave broken links and error messages in a Datalad download.

This is very preliminary, feedback much appreciated.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-10T02:28:02Z

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

emmetaobrien · 2024-11-04T18:42:31Z

Additional clarification, because on rereading I realise I did not explicitly specify this.

archivedVersions should contains two fields:
archivedVersions->count, which starts at 1 for the first version, and is incremented by one for each following version.
archivedVersions->label, which is the label for that version displayed in the interface; I suggest this can default to the value of count.

The idea here is that the automatic logic specified above works on count. Each value of count has an associated label, which can be anything the user likes. That way we don't have to worry about parsing whether the 'label' after "1.0" is "2.0", "1.1", "1.01", or "1.0-update-2024-Nov-04", and our code can be agnostic about each individual user's preferred version-labelling format.

emmetaobrien added enhancement New feature or request Discussion Required labels Apr 12, 2024

emmetaobrien assigned samirdas, cmadjar, laemtl, GHPBZ, bryancaron and carona898 Apr 12, 2024

emmetaobrien self-assigned this Apr 24, 2024

github-actions bot added the Stale label Sep 10, 2024

emmetaobrien removed the Stale label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using ARKs for version control, initial draft proposal #887

Using ARKs for version control, initial draft proposal #887

emmetaobrien commented Apr 12, 2024 •

edited

Loading

github-actions bot commented Sep 10, 2024

emmetaobrien commented Nov 4, 2024 •

edited

Loading

Using ARKs for version control, initial draft proposal #887

Using ARKs for version control, initial draft proposal #887

Comments

emmetaobrien commented Apr 12, 2024 • edited Loading

github-actions bot commented Sep 10, 2024

emmetaobrien commented Nov 4, 2024 • edited Loading

emmetaobrien commented Apr 12, 2024 •

edited

Loading

emmetaobrien commented Nov 4, 2024 •

edited

Loading