Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOE OSTI DOIs for input4MIPs #177

Open
durack1 opened this issue Jan 15, 2025 · 8 comments
Open

DOE OSTI DOIs for input4MIPs #177

durack1 opened this issue Jan 15, 2025 · 8 comments

Comments

@durack1
Copy link
Contributor

durack1 commented Jan 15, 2025

Just adding a placeholder issue, so we can centralize information about what the DOI OSTI service requires from authors to get a DOI issued.

We can then update the source_id and institution_id registration info, with the additional fields

ping @jitendra-kumar @sashakames

@durack1
Copy link
Contributor Author

durack1 commented Jan 16, 2025

FYI to self, 210 DOIs were issued by the CMIP6-era citation service for input4MIPs, so maybe we need to bump up the ~100 number we've discussed - see https://www.wdc-climate.de/ui/statistics?type=cmip6_doi_registration.

Also relevant is the CMIP6 Data Citation and Long-Term Archival wiki - https://redmine.dkrz.de/projects/cmip6-lta-and-data-citation/wiki

@durack1
Copy link
Contributor Author

durack1 commented Jan 28, 2025

Hi @jitendra-kumar. Just circling on this task, is there any progress to report? We have a project meeting tomorrow, so I was keen to update the data providers about the status and timings

@jitendra-kumar
Copy link

@durack1

Here's a summary of fields we need information for to register with OSTI. Many (but not all) of these information exist within the JSONs in this repo and we can pull the information together from the existing JSONs, and create a new JSON with all the information needed to register DOI for each dataset.

Product Description:

  • Dataset Title
  • Authors/Contributors
    • First/Last name
    • Email
    • ORCID
    • Affiliation
  • Related DOIs (if any) -- for cross-referencing [OPTIONAL]
  • Originating Research Organization
  • Publication Date
  • Sponsoring Organization
  • Keywords:
  • Geolocation -- [WE CAN ADD THIS IF ALL DATA ARE EXPECTED TO BE GLOBAL]
  • Dataset Description/Abstract

Dataset Location:

  • Landing page URL
  • Dataset file extension. [OPTIONAL -- will be .nc in most/all cases]
  • Dataset size

@durack1
Copy link
Contributor Author

durack1 commented Jan 29, 2025

@jitendra-kumar, that's great. What is the best/easiest format for this info to be collated, considering this first pass is going to be manual copy-and-paste — text files or another format?

@jitendra-kumar
Copy link

We should put the information together in a JSON, and that would allow us to automate the process at the later date. And even for the short term I can extract everything needed from that for manual entry.

@sashakames
Copy link

We will need to create .html landing pages. The .json could be used to render those. We would need then to put together a template. Then push those pages to gh-pages. This could be done with Github Actions.

@znichollscr
Copy link
Collaborator

znichollscr commented Jan 29, 2025

Do you have any ideas for the schema @jitendra-kumar ? E.g. do certain fields need to be strings/boolean/lists etc.? I think that is the key. Once we have the schema, writing data to match it is relatively trivial.

Even just something like the below

Schema proposal
from attrs import define

@define
class Author:
    first_name: str
    last_name: str
    orcid: str  # would we also validate this, probably a good idea if easy
    affiliation: str
    affiliation_ror: str | None  # optional for anyone whose institute isn't registered
    

@define
class Product:
    dataset_title: str
    authors: list[Author]
    related_dois: list[str]  # should validate that these are DOIs
    originating_research_organisation: str  # I find this field a bit weird, given most things have multiple authors therefore source organisations and the authors have affiliations anyway
    publication_date: str  # YYYY-MM-DD I guess?
    sponsoring_organisation: str  # as above re needing multiple and info already being in author info. Also unclear to me what the difference from the other orgs is so I would suggest making this optional if we can
    keywords: list[str]
    geolocation: tuple[float, float]  # what do we put here? Lat/lon co-ords? Would suggest making this optional or dropping if we can
    description: str
    dataset_location: tuple[float, float]  # what do we put here? Lat/lon co-ords? Would suggest making this optional or dropping if we can. Or do I misunderstand this field?

@define
class Dataset:
    url: str  # validate this is a URL
    extension: str
    size: float  # in bytes I guess?

@jitendra-kumar
Copy link

@znichollscr working on this schema to be consistent with what OSTI wants. Will have something to share soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants