Skip to content

Commit

Permalink
Make and schema convert
Browse files Browse the repository at this point in the history
Schematic v24.10.2
  • Loading branch information
Bankso committed Oct 31, 2024
1 parent cff382a commit e1f351e
Show file tree
Hide file tree
Showing 2 changed files with 399 additions and 372 deletions.
21 changes: 12 additions & 9 deletions mc2.model.csv
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ Publication View,The denormalized manifest for publication submission.,,"Compone
Publication Grant Number,"Relevant grant number associated witht the publication's development. Multiple values permitted, comma separated.","Affiliated/Non-Grant Associated, CA184897, CA184898, CA188388, CA193313, CA193417, CA193419, CA193461, CA193489, CA195469, CA199315, CA202123, CA202144, CA202177, CA202229, CA202241, CA209891, CA209923, CA209971, CA209975, CA209978, CA209988, CA209992, CA209997, CA210152, CA210173, CA210180, CA210181, CA210184, CA210190, CA214282, CA214292, CA214297, CA214300, CA214354, CA214369, CA214381, CA214411, CA215709, CA215794, CA215798, CA215845, CA215848, CA217297, CA217376, CA217377, CA217378, CA217450, CA217456, CA217514, CA217613, CA217617, CA217655, CA220378, CA223976, CA224012, CA224013, CA224044, CA225088, CA225566, CA227136, CA227544, CA227550, CA228608, CA228963, CA231978, CA232137, CA232161, CA232209, CA232216, CA232382, CA232517, CA234787, CA235747, CA238475, CA238720, CA238728, CA240301, CA241137, CA241927, CA243004, CA243007, CA243072, CA243073, CA243075, CA244100, CA244101, CA244107, CA244109, CA245313, CA248890, CA249799, CA250040, CA250044, CA250046, CA250481, CA251443, CA253248, CA253472, CA253540, CA253547, CA253553, CA254200, CA254886, CA256054, CA256481, CA260432, CA261694, CA261701, CA261717, CA261719, CA261822, CA261841, CA261842, CA263001, CA264583, CA264610, CA264611, CA264620, CA267170, CA268069, CA268072, CA268083, CA268084, CA271273, CA274492, CA274494, CA274499, CA274502, CA274509, CA275808, CA279408, CA279560, CA280984, CA280849, CA280829, CA284090, CA284086, CA274504, CA283114, CA274507, CA274506, CA274511, CA282451",,TRUE,,,,,list like
Publication Dataset Alias,"A list of the dataset aliases (An alias of the dataset must be unique. Can be the GEO identifier such as GSE12345, or a DOI. No Greek Letters) associated with the publication. Multiple values permitted, comma separated.",,,FALSE,,,,,list like
PublicationView_id,A unique primary key that enables record updates using schematic.,,,TRUE,,,,,unique
Sequencing Level 1,Unaligned/unprocessed sequencing data ,,"Component, SequencingLevel1_id, Filename, NGS Library Strategy, NGS Library Source, NGS Library Selection Method, NGS Library Layout, NGS Sequencing Platform, NGS Sequencing Design Description, NGS Library Preparation Kit Name, NGS Library Preparation Kit Vendor, NGS Library Preparation Kit Version, NGS Read Indicator, NGS Library Preparation Days from Index, NGS Sequencing Library Construction Days from Index, NGS End Bias, NGS Reverse Transcription Primer, NGS RIN, NGS DV200",FALSE,,,,MC2,
Sequencing Level 1,Unaligned/unprocessed sequencing data ,,"Component, SequencingLevel1_id, Filename, NGS Library Strategy, NGS Library Source, NGS Library Selection Method, NGS Library Layout, NGS Sequencing Platform, NGS Sequencing Design Description, NGS Library Preparation Kit Name, NGS Library Preparation Kit Vendor, NGS Library Preparation Kit Version, NGS Read Indicator, NGS Library Preparation Days from Index, NGS Sequencing Library Construction Days from Index",FALSE,,,,MC2,
SequencingLevel1_id,A unique identifier used by schematic for record updates and as a reference key in other schemas,,,TRUE,,,,MC2,unique
NGS Sequencing Coverage,Depth of coverage on assembly used. Found by (Unique Aligned Basecalls)/(Reference Length),,,TRUE,,,,HTAN,num
NGS Read Length,"The average length of the sequencing reads. Can be integer, null",,,TRUE,,,,HTAN,num
Expand All @@ -350,10 +350,6 @@ NGS Sequencing Design Description,Free-form description of the methods used to c
NGS Read Indicator,"Indicate if this is Read 1 (R1), Read 2 (R2), Index Reads (I1), or Other","R1, R2, R1&R2, I1, Other",,TRUE,,,,HTAN,str
NGS Library Preparation Days from Index,Number of days between sample for assay was received in lab and the libraries were prepared for sequencing [number]. If not applicable please enter 'Not Applicable',,,FALSE,,,,HTAN,regex match \d+$|Not\sApplicable$|unknown$
NGS Sequencing Library Construction Days from Index,Number of days between sample for assay was received in lab and day of sequencing library construction [number]. If not applicable please enter 'Not Applicable',,,TRUE,,,,HTAN,regex match \d+$|Not\sApplicable$|unknown$
NGS End Bias,"The end of the cDNA molecule that is preferentially sequenced, e.g. 3/5 prime tag/end or the full length transcript","3 Prime, 5 Prime, Full Length Transcript",,TRUE,,,,HTAN,str
NGS Reverse Transcription Primer,"An oligo to which new deoxyribonucleotides can be added by DNA polymerase [SO_0000112]. The type of primer used for reverse transcription, e.g. oligo-dT or random primer. This allows users to identify content of the cDNA library input e.g. enriched for mRNA","Oligo-dT, Poly-dT, Feature barcoding, Random",,TRUE,,,,HTAN,str
NGS RIN,A numerical assessment of the integrity of RNA based on the entire electrophoretic trace of the RNA sample including the presence or absence of degradation products. Number,,,FALSE,,,,HTAN,num
NGS DV200,Represents the percentage of RNA fragments that are >200 nucleotides in size. Number,,,FALSE,,,,HTAN,num
Sequencing Level 2,"Sequencing data that has been aligned with a reference genome.",,"Component, SequencingLevel2_id, SequencingLevel1 Key, Filename, NGS Library Strategy, NGS Library Source, NGS Library Selection Method, NGS Library Layout, NGS Sequencing Platform, NGS Sequencing Design Description, NGS Raw reads, NGS Stitched reads, NGS Aligned reads, NGS Deduplicated reads, NGS Trimmed reads, NGS MapQ30, NGS Unique Bases, NGS Read Length, NGS Sequencing Coverage, Genomic Reference, Software and Version",FALSE,,,,,
SequencingLevel2_id,A unique identifier used by schematic for record updates and as a reference key in other schemas,,,TRUE,,,,MC2,unique
NGS Raw reads,"Reads not yet analyzed in any way to be used for data analysis. The number of reads that pass filter from the flow cell represented in the FASTQ file. ",,,FALSE,,,,HTAN,num
Expand All @@ -369,6 +365,12 @@ SequencingLevel3_id,A unique identifier used by schematic for record updates and
NGS Unique Probe Count,Total number of unique probes reported.,,,FALSE,,,,HTAN,num
NGS Unique Target Count,Total number of unique genes reported.,,,FALSE,,,,HTAN,num
NGS Matrix Type,Type of count data stored in matrix.,"Raw Counts, Normalized Counts, Scaled Counts, Batch Corrected Counts",,FALSE,,,,HTAN,
Sequencing RNA Level 1,Unaligned/unprocessed RNA sequencing data ,,"Component, SequencingRNALevel1_id, Filename, NGS RNA End Bias, NGS RNA Reverse Transcription Primer, NGS RNA RIN, NGS RNA DV200",FALSE,,,Sequencing Level 1,MC2,
SequencingRNALevel1_id,A unique identifier used by schematic for record updates and as a reference key in other schemas,,,TRUE,,,,MC2,unique
NGS RNA End Bias,"The end of the cDNA molecule that is preferentially sequenced, e.g. 3/5 prime tag/end or the full length transcript","3 Prime, 5 Prime, Full Length Transcript",,TRUE,,,,HTAN,str
NGS RNA Reverse Transcription Primer,"An oligo to which new deoxyribonucleotides can be added by DNA polymerase [SO_0000112]. The type of primer used for reverse transcription, e.g. oligo-dT or random primer. This allows users to identify content of the cDNA library input e.g. enriched for mRNA","Oligo-dT, Poly-dT, Feature barcoding, Random",,TRUE,,,,HTAN,str
NGS RNA RIN,A numerical assessment of the integrity of RNA based on the entire electrophoretic trace of the RNA sample including the presence or absence of degradation products. Number,,,FALSE,,,,HTAN,num
NGS RNA DV200,Represents the percentage of RNA fragments that are >200 nucleotides in size. Number,,,FALSE,,,,HTAN,num
Component,"Category of metadata (e.g. Tools, Publications, etc.); provide the same one for all items/rows.",,,TRUE,,shared,,,
Data Use Codes,"DUO code - A data item that is used to indicate consent permissions for datasets and/or materials, and relates to the purposes for which datasets and/or material might be removed, stored or used. Available DUO code definitions can be found here: https://mc2-center.github.io/data-models/valid_values/file/#attribute-data-use-codes","IRB, HMB, PUB, US, NPOA, COL, NCU, NPUNCU, RS, TS, NRES, NPU, DUM, POA, MOR, GSO, RTN, CC, NMDS, IS, GS, DS, GRU, PS",,FALSE,,shared,,Data Use Ontology,list like
ProjectView Key,"Unique ProjectView_id foreign key(s) that group the resource with other components, as part of the same grant-associated studies. Please provide multiple values as a comma-separated list.",,,FALSE,,project,,MC2,matchAtLeastOne ProjectView.ProjectView_id Value
Expand Down Expand Up @@ -401,6 +403,7 @@ SequencingLevel1 Key,Unique SequencingLevel1_id foreign key(s) that link metadat
SequencingLevel2 Key,Unique SequencingLevel2_id foreign key(s) that link metadata entries as part of the same Dataset. Please provide multiple values as a comma-separate list.,,,FALSE,,sequencing,,MC2,matchAtLeastOne SequencingLevel2.SequencingLevel2_id Value
SequencingLevel3 Key,Unique SequencingLevel3_id foreign key(s) that link metadata entries as part of the same Dataset. Please provide multiple values as a comma-separate list.,,,FALSE,,sequencing,,MC2,matchAtLeastOne SequencingLevel3.SequencingLevel3_id Value
SequencingLevel4 Key,Unique SequencingLevel4_id foreign key(s) that link metadata entries as part of the same Dataset. Please provide multiple values as a comma-separate list.,,,FALSE,,sequencing,,MC2,matchAtLeastOne SequencingLevel4.SequencingLevel4_id Value
SequencingRNALevel1 Key,Unique SequencingRNALevel1_id foreign key(s) that link metadata entries as part of the same Dataset. Please provide multiple values as a comma-separate list.,,,FALSE,,sequencing RNA,,MC2,matchAtLeastOne SequencingRNALevel1.SequencingRNALevel1_id Value
10xVisiumRNALevel1 Key,Unique 10xVisiumRNALevel1_id foreign key(s) that link metadata entries as part of the same Dataset. Please provide multiple values as a comma-separate list.,,,FALSE,,visium,,MC2,matchAtLeastOne 10xVisiumRNALevel1.10xVisiumRNALevel1_id Value
10xVisiumRNALevel2 Key,Unique 10xVisiumRNALevel2_id foreign key(s) that link metadata entries as part of the same Dataset. Please provide multiple values as a comma-separate list.,,,FALSE,,visium,,MC2,matchAtLeastOne 10xVisiumRNALevel2.10xVisiumRNALevel2_id Value
10xVisiumRNALevel3 Key,Unique 10xVisiumRNALevel3_id foreign key(s) that link metadata entries as part of the same Dataset. Please provide multiple values as a comma-separate list.,,,FALSE,,visium,,MC2,matchAtLeastOne 10xVisiumRNALevel3.10xVisiumRNALevel3_id Value
Expand Down Expand Up @@ -495,9 +498,9 @@ Tool Type,"A type of application software: a discrete software entity can have m
Tool Version,Version information (typically a version number) of the software applicable to this entry.,,,FALSE,,,,biotoolsschema,
Tool View,The denormalized manifest for tool submission.,,"Component, ToolView_id, Study key, DatasetView Key, PublicationView Key, Tool Name, Tool Description, Tool Homepage, Tool Version, Tool Operation, Tool Input Data, Tool Output Data, Tool Input Format, Tool Output Format, Tool Function Note, Tool Cmd, Tool Type, Tool Topic, Tool Operating System, Tool Language, Tool License, Tool Cost, Tool Accessibility, Tool Download Url, Tool Download Type, Tool Download Note, Tool Download Version, Tool Documentation Url, Tool Documentation Type, Tool Documentation Note, Tool Link Url, Tool Link Type, Tool Link Note, Tool Date Last Modified, Tool Release Date, Tool Package Dependencies, Tool Package Dependencies Present, Tool Compute Requirements, Tool Entity Name, Tool Entity Type, Tool Entity Role",FALSE,,,Study,,
ToolView_id,A unique primary key that enables record updates using schematic.,,,TRUE,,,,,unique
10x Visium Auxiliary Files,"Auxiliary data associated with spot/slide analysis (aligned Images, quality control files, etc) from Spatial Transcriptomics.",,"Component,Filename,File Format,Biospecimen Key,10xVisiumAuxiliaryFiles_id,10xVisiumRNALevel1 Key,10xVisiumRNALevel2 Key,10xVisiumRNALevel3 Key,10xVisiumRNALevel4 Key,Run ID,Visium File Type,Slide ID,Capture Area,Workflow Version,Workflow Link",FALSE,,visium,"10x Visium RNA Level 1, 10x Visium RNA Level 2, 10x Visium RNA Level 3, 10x Visium RNA Level 4",HTAN,
10x Visium Auxiliary Files,"Auxiliary data associated with spot/slide analysis (aligned Images, quality control files, etc) from Spatial Transcriptomics.",,"Component, Filename, 10xVisiumAuxiliaryFiles_id, 10xVisiumRNALevel1 Key, 10xVisiumRNALevel2 Key, 10xVisiumRNALevel3 Key, 10xVisiumRNALevel4 Key, Visium Run ID, Visium File Type, Visium Slide ID, Capture Area, Workflow Version, Workflow Link",FALSE,,visium,"10x Visium RNA Level 1, 10x Visium RNA Level 2, 10x Visium RNA Level 3, 10x Visium RNA Level 4",HTAN,
10xVisiumAuxiliaryFiles_id,"Unique row identifier, used as a primary key for record updates",,,TRUE,,visium,,MC2,unique
10x Visium RNA Level 1,Files contain raw RNA-seq data associated with spot/slide data.,,"Component,10xVisiumRNALevel1_id,Filename,Visium Run ID,File Format,Biospecimen Key,File Alias,NGS Read Indicator,Visium Spatial Read1,Visium Spatial Read2,Visium Spatial Library Construction Method,NGS Library Preparation Days from Index,NGS Sequencing Library Construction Days from Index,NGS End Bias,NGS Reverse Transcription Primer,NGS Sequencing Platform,Visium Capture Area,Protocol Link,Visium Slide Version,Visium Slide ID,Visium Image Re-orientation,Visium Permeabilization Time,NGS RIN,NGS DV200",FALSE,,,File View,,
10x Visium RNA Level 1,Files contain raw RNA-seq data associated with spot/slide data.,,"Component, 10xVisiumRNALevel1_id, Filename, Visium Run ID, NGS Read Indicator, Visium Spatial Read1, Visium Spatial Read2, Visium Spatial Library Construction Method, NGS Library Preparation Days from Index, NGS Sequencing Library Construction Days from Index, NGS RNA End Bias, NGS RNA Reverse Transcription Primer, NGS Sequencing Platform, Visium Capture Area, Protocol Link, Visium Slide Version, Visium Slide ID, Visium Image Re-orientation, Visium Permeabilization Time, NGS RNA RIN, NGS RNA DV200",FALSE,,,File View,,
10xVisiumRNALevel1_id,"Unique row identifier, used as a primary key for record updates",,,TRUE,,visium,,,unique
Visium Run ID,A unique identifier for this individual run (typically associated with a single slide) of the spatial transcriptomic processing workflow.,,,TRUE,,visium,,HTAN,unique
Visium Capture Area,"Area (or Capture Area) - One of the either four or two active regions where tissue can be placed on a Visium slide. Each area is intended to contain only one tissue sample. Slide areas are named consecutively from top to bottom: A1, B1, C1, D1 for Visium slides with 6.5 mm Capture Area and A, B for CytAssist slides with 11 mm Capture Area. Both CytAssist slides with 6.5 mm Capture Area and Gateway Slides contain only two slide areas, A1 and D1.","A, B, C, D, A1, B1, C1, D1",,FALSE,,visium,,HTAN,str
Expand All @@ -514,7 +517,7 @@ Visium UMI Tag,"SAM tag for the UMI field; please provide a valid tag-type pair,
Visium Whitelist Spatial Barcode File Link,Link to file listing all possible spatial barcodes. URL,,,TRUE,,visium,,HTAN,url
Visium Spatial Barcode Tag,"SAM tag for spot barcode field; please provide a valid tag-type pair, consisting of a tag (e.g. CB or CR) and type (e.g. Z) separated by a colon.",,,TRUE,,visium,,HTAN,str
Visium Applied Hard Trimming,Was Hard Trimming applied,"True, False",,TRUE,,visium,,HTAN,str
10x Visium RNA Level 3,Processed data files based on Spatial Transcriptomics RNA-seq Level 2 and Spatial Transcriptomics Auxiliary files.,,"Component,10xVisiumRNALevel3_id,Filename,File Format,Biospecimen Key,10xVisiumRNALevel2 Key,10xVisiumAuxiliaryFiles Key,File Alias,Visium Run ID,Visium File Type,Workflow Version,Workflow Link,Visium Capture Area,Visium Spots under tissue,Visium Mean Reads per Spatial Spot,Visium Median Number Genes per Spatial Spot,NGS Sequencing Coverage,Visium Proportion Reads Mapped,Visium Proportion Reads Mapped to Transcriptome,Visium Median UMI Counts per Spot",FALSE,,visium,10x Visium RNA Level 2,HTAN,
10x Visium RNA Level 3,Processed data files based on Spatial Transcriptomics RNA-seq Level 2 and Spatial Transcriptomics Auxiliary files.,,"Component, 10xVisiumRNALevel3_id, Filename, 10xVisiumRNALevel2 Key, 10xVisiumAuxiliaryFiles Key, Visium Run ID, Visium File Type, Workflow Version, Workflow Link, Visium Capture Area, Visium Spots under tissue, Visium Mean Reads per Spatial Spot, Visium Median Number Genes per Spatial Spot, NGS Sequencing Coverage, Visium Proportion Reads Mapped, Visium Proportion Reads Mapped to Transcriptome, Visium Median UMI Counts per Spot",FALSE,,visium,10x Visium RNA Level 2,HTAN,
10xVisiumRNALevel3_id,"Unique row identifier, used as a primary key for record updates",,,TRUE,,visium,,MC2,unique
Visium File Type,The file type generated for the visium experiment.,"reference png, reference jpg, json scale factors, probe dataset csv, qc result html, filtered mex, unfiltered mex, tissue_positions, barcodes, features, fiducial image png, fiducial image jpg, detected image png, detected jpg, high res image, low res image, json scale factors, probe dataset csv",,TRUE,,visium,,HTAN,str
Visium Spots under tissue,The number of barcodes associated with a spot under tissue.,,,TRUE,,visium,,HTAN,num
Expand All @@ -523,7 +526,7 @@ Visium Median Number Genes per Spatial Spot,The median number of genes detected
Visium Proportion Reads Mapped,Proportion of mapped reads collected from samtools. Number,,,FALSE,,visium,,HTAN,num
Visium Proportion Reads Mapped to Transcriptome,Fraction of reads that mapped to a unique gene in the transcriptome. The read must be consistent with annotated splice junctions. These reads are considered for UMI counting.,,,TRUE,,visium,,HTAN,num
Visium Median UMI Counts per Spot,The median number of UMI counts per tissue covered spot.,,,TRUE,,visium,,HTAN,num
10x Visium RNA Level 4,Processed data files based on Spatial Transcriptomics RNA-seq Level 3.,,"Component,10xVisiumRNALevel4_id,Filename,File Format,Biospecimen Key,10xVisiumRNALevel3 Key,Run ID,Workflow Version,Workflow Link,Visium Workflow Type,Visium Workflow Parameters Description",FALSE,,visium,10x Visium RNA Level 3,HTAN,
10x Visium RNA Level 4,Processed data files based on Spatial Transcriptomics RNA-seq Level 3.,,"Component, 10xVisiumRNALevel4_id, Filename, 10xVisiumRNALevel3 Key, Visium Run ID, Workflow Version, Workflow Link, Visium Workflow Type, Visium Workflow Parameters Description",FALSE,,visium,10x Visium RNA Level 3,HTAN,
10xVisiumRNALevel4_id,"Unique row identifier, used as a primary key for record updates",,,TRUE,,visium,,MC2,unique
Visium Workflow Type,Generic name for the workflow used to analyze the visium data set.,,,TRUE,,visium,,HTAN,str
Visium Workflow Parameters Description,Parameters used to run the workflow.,,,TRUE,,visium,,HTAN,str
Loading

0 comments on commit e1f351e

Please sign in to comment.