Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-release revisions and alignment with CDS v5.0.4 #169

Open
wants to merge 107 commits into
base: main
Choose a base branch
from

Conversation

Bankso
Copy link
Contributor

@Bankso Bankso commented Feb 8, 2025

Fixes #151 and adds DOI columns noted in #159

This is an extension of the data model refactor and alignment with CDS model v5.0.4.

This update reflects the current Synapse table schemas used in grant-specific projects and UNION tables for GrantView, PublicationView, DatasetView, ToolView, and EducationalResource components.

Also included in this update are updated models for participant, specimen, file, and assay metadata, aligned with CDS v5.04 and mapped via this document.

Changelog

  • Added valid values for CDS data model, v5.0.4

  • Added study- and specimen-focused attributes, to better address some data elements requested by CDS model

  • Added DOI columns for Dataset View, Tool View, and Educational Resource models

  • Removed GeoMx attributes covered by Sequencing models

  • Added missing "Key" attributes, to support data model linkages

  • Specific model types mapped to CDS 5.0.4 and updated:

    • Study
    • Biospecimen
    • Individual
    • Model
    • File View
    • Sequencing Level 1 and 2
    • Imaging Level 2, 3, and Channel
  • Models are also available for:

    • Data Sharing Plan
    • Sequencing Level 3
    • Imaging Level 1 and 4
    • GeoMx Level 1, 2, 3, Image, ROI/Annotations (updated), and Auxiliary files (added)
    • Visium Level 1, 2, 3, 4, and Auxiliary files (updated)
    • Sequencing RNA Level 1 (added)

Sequencing Level 1 model contains everything needed, so GeoMx Level 1 can be limited to just GeoMx-specific attributes
Sequencing Level 2 model contains everything needed, so GeoMx Level 2 can be limited to just GeoMx-specific attributes
Sequencing Level 3 model contains everything needed, so GeoMx Level 3 can be limited to just GeoMx-specific attributes
Already contained in Sequencing Level 1
Include attributes from shared that capture workflow details
- added DSP Dataset Metadata attribute and added to schema def
- update descriptions to provide better guidance
Add "Metadata" as a file level, to help indicate that the information must be mapped onto a metadata template
CV updates from script
Using schematic 24.12.1
Add attribute GeoMx Imaging AOI Coordinates and update schema definition.

This attribute allows a submitter to identify specific coordinate/mask files to be used with a GeoMx image
Using schematic 24.12.1
Retain Publication Dataset Alias to simplify curation and database updates
Move json to json_schemas and csvs to templates
Add output file handling
Replace Component Key attributes in Collection with Component Table Id attributes, which will be used to log table subsets generated for sharing alongside file Datasets
Using schematic v24.12.1
Add data type qualifier to description
Added DOI attributes for Dataset View, Tool View, and Educational Resource models
Also added Keys to Educational Resource, but retained tool and dataset link attributes, in case the ed res uses resources from outside of Synapse
For mapping and conversion
Bankso added 26 commits February 7, 2025 17:07
Align valid values from CDS v5.0.4
Align valid values from CDS v5.0.4
Align valid values from CDS v5.0.4
Align valid values from CDS v5.0.4
Align valid values from CDS v5.0.4
Align valid values from CDS v5.0.4
Align valid values from CDS v5.0.4
These are stored in visium-specific folders
Remove unneeded attributes, covered by File View template
These values live in sequencing-specific folders, CSVs in shared were redundant
Equivalent to tissueOrganOriginCDS.csv, so I deleted this file and replaced mapping reference with reference to tissueOrganOriginCDS.csv
Align mappings with current set of valid values
Added Biospecimen Treatment Response, Incidence Type
Added valid values for Acquisition method
Removed Biospecimen Pathology
Updated by update valid values script
Updated by update valid values script
Updated by update valid values script
Updated by update valid values script
Updated by update valid values script
Updated by update valid values script; Moved attributes NGS Raw Reads and NGS Unique Bases  from sequencing level 2 to sequencing level 1 and added to level 1 model definition. These are requested by CDS model
Updated by update valid values script; removed attributes now in sequencing level 1
Updated by update valid values script
Updated by update valid values script
generated via schematic 24.12.1
This will be maintained locally until a clear need for storage in the repo is identified
@Bankso Bankso requested a review from aditigopalan as a code owner February 8, 2025 01:59
@Bankso Bankso added the major PR label for a major update label Feb 8, 2025
@Bankso
Copy link
Contributor Author

Bankso commented Feb 8, 2025

@aditigopalan please let me know if anything looks off! I was able to generate a google sheet template using the JSON-LD, so it at least works 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major PR label for a major update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data model updates integration testing
1 participant