Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Migration API documentation improvements #11192

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions doc/sphinx-guides/source/_static/api/dataset-migrate.jsonld
Original file line number Diff line number Diff line change
@@ -1,11 +1,31 @@
{
"citation:depositor": "Admin, Dataverse",
"title": "Test Dataset",
"socialscience:collectionMode": [
"demonstration"
],
"subject": "Computer and Information Science",
"geospatial:geographicCoverage": [
{
"geospatial:otherGeographicCoverage": "Cambridge"
},
{
"geospatial:otherGeographicCoverage": "Massachusetts"
}
],
"author": {
"citation:authorName": "Admin, Dataverse",
"citation:authorAffiliation": "GDCC"
},
"kindOfData": "demonstration data",
"citation:keyword": [
{
"citation:keywordValue": "first keyword"
},
{
"citation:keywordValue": "second keyword"
}
],
"dateOfDeposit": "2020-10-08",
"citation:distributor": {
"citation:distributorName": "Demo Dataverse Repository",
Expand Down Expand Up @@ -35,5 +55,9 @@
"title": "http://purl.org/dc/terms/title",
"citation": "https://dataverse.org/schema/citation/",
"dvcore": "https://dataverse.org/schema/core#",
"schema": "http://schema.org/"
}}
"schema": "http://schema.org/",
"geospatial": "dataverse.siteUrl/schema/geospatial#",
"socialscience": "dataverse.siteUrl/schema/socialscience#",
pdurbin marked this conversation as resolved.
Show resolved Hide resolved
"kindOfData": "http://rdf-vocabulary.ddialliance.org/discovery#kindOfData"
}
}
16 changes: 16 additions & 0 deletions doc/sphinx-guides/source/_static/api/transform-oai-ore-jsonld.xq
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
declare option output:method "json";

let $parameters:={ 'method': 'json' }
for $record in /json
let $metadata:=$record/ore_003adescribes


let $json:=
<json type="object">
{$metadata/*}
{$record/_0040context}
</json>


return if ($metadata) then
file:write("converted.json",$json, $parameters)
21 changes: 16 additions & 5 deletions doc/sphinx-guides/source/developers/dataset-migration-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,15 @@ The Dataverse software includes several ways to add Datasets originally created

This experimental migration API offers an additional option with some potential advantages:

* metadata can be specified using the json-ld format used in the OAI-ORE metadata export
* existing publication dates and PIDs are maintained (currently limited to the case where the PID can be managed by the Dataverse software, e.g. where the authority and shoulder match those the software is configured for)
* updating the PID at the provider can be done immediately or later (with other existing APIs)
* adding files can be done via the standard APIs, including using direct-upload to S3
* Metadata can be specified using the json-ld format used in the OAI-ORE metadata export. Please note that the json-ld generated by OAI-ORE metadata export is not directly compatible with the Migration API. OAI-ORE export nests resource metadata under :code:`ore:describes` wrapper and Dataset Migration API requires that metadata is on the root level. Please check example file below for reference.

* If you need a tool to convert OAI-ORE exported json-ld into a format compatible with the Dataset Migration API, or if you need to generate compatible json-ld from sources other than an existing Dataverse installation, `BaseX <http://basex.org>`_ database engine, used together with the XQuery language, provides an efficient solution. Please see example script :download:`transform-oai-ore-jsonld.xq <../_static/api/transform-oai-ore-jsonld.xq>` for a simple conversion from exported OAI-ORE json-ld to a Dataset Migration API -compatible version.

* Existing publication dates and PIDs are maintained (currently limited to the case where the PID can be managed by the Dataverse software, e.g. where the authority and shoulder match those the software is configured for)

* Updating the PID at the provider can be done immediately or later (with other existing APIs).

* Adding files can be done via the standard APIs, including using direct-upload to S3.

This API consists of 2 calls: one to create an initial Dataset version, and one to 'republish' the dataset through Dataverse with a specified publication date.
Both calls require super-admin privileges.
Expand All @@ -31,7 +36,13 @@ To import a dataset with an existing persistent identifier (PID), the provided j

curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets/:startmigration --upload-file dataset-migrate.jsonld

An example jsonld file is available at :download:`dataset-migrate.jsonld <../_static/api/dataset-migrate.jsonld>` . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance.
An example jsonld file is available at :download:`dataset-migrate.jsonld <../_static/api/dataset-migrate.jsonld>` . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance.

You also need to replace the :code:`dataverse.siteUrl` in the json-ld :code:`@context` with your current Dataverse site URL. This is necessary to define a local URI for metadata terms originating from community metadata blocks (in the case of the example file, from the Social Sciences and Humanities and Geospatial blocks).

Currently, as of Dataverse 6.5 and earlier, community metadata blocks do not assign a default global URI to the terms used in the block in contrast to citation metadata, which has global URI defined.



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just putting this comment at the bottom.

@mjlassila from our conversation I was sort of hoping you'd include your BaseX and XQuery script at https://gist.github.com/mjlassila/ecdbd11447ccdf87995db20bfc5e686c 😄 . It it worth writing up and including? (I don't even know how to run it but I'm happy to learn!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdurbin I added details about OAI-ORE format used in the export and included simple XQuery script for conversion. XQuery is a rather esoteric language and not very well known, but it is highly powerful in many use cases, such as format conversion.

Publish a Migrated Dataset
--------------------------
Expand Down