From 594ad1b80807a18ec5f422e17ec1dd9871508880 Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Mon, 22 Jul 2024 13:10:46 -0400 Subject: [PATCH 01/10] initial commit --- input/fsh/modules/files.fsh | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 input/fsh/modules/files.fsh diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh new file mode 100644 index 000000000..abb3dc881 --- /dev/null +++ b/input/fsh/modules/files.fsh @@ -0,0 +1,5 @@ +/* +Files Module profiles and logical Model +*/ + + From 270b353cb0d0a2849f5fd48c44f7c5fe94021e6c Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Thu, 25 Jul 2024 09:08:43 -0400 Subject: [PATCH 02/10] :tada: create logical model and part of file profile --- input/fsh/modules/files.fsh | 79 +++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh index abb3dc881..1c31f4799 100644 --- a/input/fsh/modules/files.fsh +++ b/input/fsh/modules/files.fsh @@ -2,4 +2,83 @@ Files Module profiles and logical Model */ +Logical: CdmFile +Id: SharedDataModelFile +Title: "Shared Data Model for File" +Description: "The **Shared Data Model for File**" +* participantID 1..* reference "The participant(s) for whom this file contains data" +* fileExternalID 0..1 string "A related identifier of this file" +* format 1..1 code "The file format used" +* location 1..* List "List of locations where this data can be accessed" +* location.URI 1..1 uri "The URI at which this data can be accessed" +* location.AccessPolicy 0..* reference "If present, only those under the specific Access Policy can access the file in this location." +* fileSize 1..1 Quantity "The size of the file, e.g., in bytes." +* hash 0..* List "Provides a list of hashes for confirming file transfers" +* hash.Type 0..1 code "Algorithm used to calculate the hash (and size, where applicable)" +* hash.Value 1..1 string "Value of hashing the file" +* contentVersion 0..1 string "Version of the file content" +* description 0..1 string "A description of the file" +* type 1..1 code "The type of data contained in this file. Should be as detailed as possible, e.g., Whole Exome Variant Calls." +* relatedFile 0..1 List "Provides a reference to another file that is related to this one" +* relatedFile.File 0..1 reference "The file to which this related file is related" +* relatedFile.Type 0..1 code "The relationship of the file to the parent file in reference" +CodeSystem: HashTypeCS +Id: example-hash-type-code-system +Title: "Hash Types Code System" +Description: "Algorithm used to calculate the hash (and size, where applicable)" +* #md5 "md5 hash type" +* #sha256 "sha256 hash type" +* #sha512 "sha512 hash type" +* #sha1 "sha1 hash type" +* #crc32 "crc32 hash type" +* #crc32c "crc32c hash type" +* #etag "etag hash type" + +CodeSystem: RelatedFileTypeCS +Id: related-file-type-code-system +Title: "Related File Type Code System" +Description: "Explains the relationship of this file to the file of reference" +* #index_of "Index of" +* #has_index "Has index" +* #data_dictionary_of "Data dictionary of" +* #has_data_dictionary "Has data dictionary" +* #plink-type-associated-files "Plink-type associated files" + +Extension: FileSize +Id: file-size +Title: "The size of the file, e.g., in bytes." +Description: "The size of the file, e.g., in bytes." +* value[x] only Quantity +* valueQuantity ^short = "Indicate the size of the file in reference" + +Extension: ContentVersion +Id: content-version +Title: "Version of the contents of the file" +Description: "Version of the contents of the file" +* value[x] only string +* valueString ^short = "Indicate the version (e.g., V1) for the contents of this file" + +Profile: NcpiFile +Parent: DocumentReference +Id: ncpi-file +Title: "NCPI File" +Description: "Information about a file related to a research participant" +* ^version = "0.0.1" +* ^status = #draft +* identifier 0..* +* identifier ^short = "A related external file ID" +* subject 0..1 +* subject ^short = "The participant(s) for whom this file contains data (i.e., ParticipantID)" +* extension contains FileSize named file-size 1..1 +* extension[file-size] ^short = "Indicate the size of the file in reference" +* extension contains ContentVersion named content-version 0..1 +* extension[content-version] ^short = "The version of the content in the file" +* description 0..1 +* description ^short = "A description of the file" +* type 0..1 +* type ^short = "The type of data contained in this file." + +/* +CodeSystem for EDAM-- link it as an external code system? +*/ \ No newline at end of file From 514e2d60ebd466d49a6dd29ae800ec45a7aab564 Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Fri, 26 Jul 2024 16:51:22 -0400 Subject: [PATCH 03/10] ":sparkles: introduce EDAM code system and write complex extension for hash information" --- input/fsh/Alias.fsh | 3 +- input/fsh/examples/files.fsh | 20 +++++++++++++ input/fsh/modules/files.fsh | 54 +++++++++++++++++++++++++++++++----- 3 files changed, 69 insertions(+), 8 deletions(-) create mode 100644 input/fsh/examples/files.fsh diff --git a/input/fsh/Alias.fsh b/input/fsh/Alias.fsh index d16519f26..df414b7f7 100644 --- a/input/fsh/Alias.fsh +++ b/input/fsh/Alias.fsh @@ -7,4 +7,5 @@ Alias: $ncpi-data-access-type = https://nih-ncpi.github.io/ncpi-fhir-ig-2/CodeSy Alias: $title-type = http://terminology.hl7.org/CodeSystem/title-type Alias: $mesh = urn:oid:2.16.840.1.113883.6.177 -Alias: $ncit = http://purl.obolibrary.org/obo/ncit.owl \ No newline at end of file +Alias: $ncit = http://purl.obolibrary.org/obo/ncit.owl +Alias: $edam = http://edamontology.org \ No newline at end of file diff --git a/input/fsh/examples/files.fsh b/input/fsh/examples/files.fsh new file mode 100644 index 000000000..7b1dc1992 --- /dev/null +++ b/input/fsh/examples/files.fsh @@ -0,0 +1,20 @@ +Instance: PT-006SP660 +InstanceOf: NcpiFile +Title: "Example file based on CBTN" +Usage: #example +Description: "Use case of file information from CBTN" +* identifier.value = "PT_006SP660" +* subject = Reference(GF_6BAD9S7D) +* description = "Annotated Variant Call" +* type = $edam#operation_3227 "Variant calling" +* status = #current +* content[0].attachment.url = "s3://kf-strides-study-us-east-1-prd-sd-54g4wg4r/harmonized-data/family-variants/155bb529-2e7b-474f-ba24-cd0656d5f3d0.CGP.filtered.deNovo.vep.vcf.gz" +/** content[0].profile = $ncpi-data-access-type#controlled "Controlled"*/ +* extension[file-format].valueCodeableConcept.coding = $edam#format_3016 "VCF" +* extension[file-size] + * valueQuantity + * value = 1044770380 + * unit = "bytes" +* extension[hash] + * extension[hash-value].valueString = "8f107912d862cf91fbfb77bf9c1bab36-4" + * extension[hash-type].valueCode = #etag diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh index 1c31f4799..e0b42fc88 100644 --- a/input/fsh/modules/files.fsh +++ b/input/fsh/modules/files.fsh @@ -10,18 +10,18 @@ Description: "The **Shared Data Model for File**" * fileExternalID 0..1 string "A related identifier of this file" * format 1..1 code "The file format used" * location 1..* List "List of locations where this data can be accessed" -* location.URI 1..1 uri "The URI at which this data can be accessed" -* location.AccessPolicy 0..* reference "If present, only those under the specific Access Policy can access the file in this location." +* location.uri 1..1 uri "The URI at which this data can be accessed" +* location.accessPolicy 0..* reference "If present, only those under the specific Access Policy can access the file in this location." * fileSize 1..1 Quantity "The size of the file, e.g., in bytes." * hash 0..* List "Provides a list of hashes for confirming file transfers" -* hash.Type 0..1 code "Algorithm used to calculate the hash (and size, where applicable)" -* hash.Value 1..1 string "Value of hashing the file" +* hash.type 0..1 code "Algorithm used to calculate the hash (and size, where applicable)" +* hash.value 1..1 string "Value of hashing the file" * contentVersion 0..1 string "Version of the file content" * description 0..1 string "A description of the file" * type 1..1 code "The type of data contained in this file. Should be as detailed as possible, e.g., Whole Exome Variant Calls." * relatedFile 0..1 List "Provides a reference to another file that is related to this one" -* relatedFile.File 0..1 reference "The file to which this related file is related" -* relatedFile.Type 0..1 code "The relationship of the file to the parent file in reference" +* relatedFile.file 0..1 reference "The file to which this related file is related" +* relatedFile.type 0..1 code "The relationship of the file to the parent file in reference" CodeSystem: HashTypeCS Id: example-hash-type-code-system @@ -45,10 +45,19 @@ Description: "Explains the relationship of this file to the file of reference" * #has_data_dictionary "Has data dictionary" * #plink-type-associated-files "Plink-type associated files" +Extension: FileFormat +Id: file-format +Title: "The file format used" +Description: "The file format used" +* insert SetContext(DocumentReference) +* value[x] only CodeableConcept +* valueCodeableConcept from $edam (extensible) + Extension: FileSize Id: file-size Title: "The size of the file, e.g., in bytes." Description: "The size of the file, e.g., in bytes." +* insert SetContext(DocumentReference) * value[x] only Quantity * valueQuantity ^short = "Indicate the size of the file in reference" @@ -56,9 +65,36 @@ Extension: ContentVersion Id: content-version Title: "Version of the contents of the file" Description: "Version of the contents of the file" +* insert SetContext(DocumentReference) * value[x] only string * valueString ^short = "Indicate the version (e.g., V1) for the contents of this file" +Extension: HashValue +Id: hash-value +Title: "Value of hashing the file" +Description: "Value of hashing the file" +* insert SetContext(DocumentReference.extension) +* value[x] only string +* valueString ^short = "Value of hashing the file" + +Extension: HashType +Id: hash-type +Title: "Algorithm used to calculate the hash (and size, where applicable)" +Description: "Algorithm used to calculate the hash (and size, where applicable)" +* insert SetContext(DocumentReference.extension) +* value[x] only code +* valueCode ^short = "Algorithm used to calculate the hash (and size, where applicable)" + +Extension: HashExtension +Id: hash-extension +Title: "Provides a list of hashes for confirming file transfers" +Description: "Provides a list of hashes for confirming file transfers" +* insert SetContext(List) +* extension contains HashValue named hash-value 1..1 +* extension[hash-value] ^short = "Value of hashing the file" +* extension contains HashType named hash-type 1..1 +* extension[hash-type] ^short = "Algorithm used to calculate the hash (and size, where applicable)" + Profile: NcpiFile Parent: DocumentReference Id: ncpi-file @@ -70,13 +106,17 @@ Description: "Information about a file related to a research participant" * identifier ^short = "A related external file ID" * subject 0..1 * subject ^short = "The participant(s) for whom this file contains data (i.e., ParticipantID)" +* extension contains FileFormat named file-format 1..1 +* extension[file-format] ^short = "The file format used (EDAM is preferred)" * extension contains FileSize named file-size 1..1 * extension[file-size] ^short = "Indicate the size of the file in reference" * extension contains ContentVersion named content-version 0..1 * extension[content-version] ^short = "The version of the content in the file" +* extension contains HashExtension named hash 0..* * description 0..1 * description ^short = "A description of the file" -* type 0..1 +* type 0..1 +* type from $edam (extensible) * type ^short = "The type of data contained in this file." /* From 241af295ce003fb4a1c07bb6bb26c45f68663c87 Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Fri, 26 Jul 2024 16:51:22 -0400 Subject: [PATCH 04/10] ":sparkles: introduce EDAM code system and write complex extension for hash information" --- input/fsh/Alias.fsh | 3 +- input/fsh/examples/files.fsh | 20 +++++++++++++ input/fsh/modules/files.fsh | 54 +++++++++++++++++++++++++++++++----- 3 files changed, 69 insertions(+), 8 deletions(-) create mode 100644 input/fsh/examples/files.fsh diff --git a/input/fsh/Alias.fsh b/input/fsh/Alias.fsh index d16519f26..df414b7f7 100644 --- a/input/fsh/Alias.fsh +++ b/input/fsh/Alias.fsh @@ -7,4 +7,5 @@ Alias: $ncpi-data-access-type = https://nih-ncpi.github.io/ncpi-fhir-ig-2/CodeSy Alias: $title-type = http://terminology.hl7.org/CodeSystem/title-type Alias: $mesh = urn:oid:2.16.840.1.113883.6.177 -Alias: $ncit = http://purl.obolibrary.org/obo/ncit.owl \ No newline at end of file +Alias: $ncit = http://purl.obolibrary.org/obo/ncit.owl +Alias: $edam = http://edamontology.org \ No newline at end of file diff --git a/input/fsh/examples/files.fsh b/input/fsh/examples/files.fsh new file mode 100644 index 000000000..7b1dc1992 --- /dev/null +++ b/input/fsh/examples/files.fsh @@ -0,0 +1,20 @@ +Instance: PT-006SP660 +InstanceOf: NcpiFile +Title: "Example file based on CBTN" +Usage: #example +Description: "Use case of file information from CBTN" +* identifier.value = "PT_006SP660" +* subject = Reference(GF_6BAD9S7D) +* description = "Annotated Variant Call" +* type = $edam#operation_3227 "Variant calling" +* status = #current +* content[0].attachment.url = "s3://kf-strides-study-us-east-1-prd-sd-54g4wg4r/harmonized-data/family-variants/155bb529-2e7b-474f-ba24-cd0656d5f3d0.CGP.filtered.deNovo.vep.vcf.gz" +/** content[0].profile = $ncpi-data-access-type#controlled "Controlled"*/ +* extension[file-format].valueCodeableConcept.coding = $edam#format_3016 "VCF" +* extension[file-size] + * valueQuantity + * value = 1044770380 + * unit = "bytes" +* extension[hash] + * extension[hash-value].valueString = "8f107912d862cf91fbfb77bf9c1bab36-4" + * extension[hash-type].valueCode = #etag diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh index 1c31f4799..e0b42fc88 100644 --- a/input/fsh/modules/files.fsh +++ b/input/fsh/modules/files.fsh @@ -10,18 +10,18 @@ Description: "The **Shared Data Model for File**" * fileExternalID 0..1 string "A related identifier of this file" * format 1..1 code "The file format used" * location 1..* List "List of locations where this data can be accessed" -* location.URI 1..1 uri "The URI at which this data can be accessed" -* location.AccessPolicy 0..* reference "If present, only those under the specific Access Policy can access the file in this location." +* location.uri 1..1 uri "The URI at which this data can be accessed" +* location.accessPolicy 0..* reference "If present, only those under the specific Access Policy can access the file in this location." * fileSize 1..1 Quantity "The size of the file, e.g., in bytes." * hash 0..* List "Provides a list of hashes for confirming file transfers" -* hash.Type 0..1 code "Algorithm used to calculate the hash (and size, where applicable)" -* hash.Value 1..1 string "Value of hashing the file" +* hash.type 0..1 code "Algorithm used to calculate the hash (and size, where applicable)" +* hash.value 1..1 string "Value of hashing the file" * contentVersion 0..1 string "Version of the file content" * description 0..1 string "A description of the file" * type 1..1 code "The type of data contained in this file. Should be as detailed as possible, e.g., Whole Exome Variant Calls." * relatedFile 0..1 List "Provides a reference to another file that is related to this one" -* relatedFile.File 0..1 reference "The file to which this related file is related" -* relatedFile.Type 0..1 code "The relationship of the file to the parent file in reference" +* relatedFile.file 0..1 reference "The file to which this related file is related" +* relatedFile.type 0..1 code "The relationship of the file to the parent file in reference" CodeSystem: HashTypeCS Id: example-hash-type-code-system @@ -45,10 +45,19 @@ Description: "Explains the relationship of this file to the file of reference" * #has_data_dictionary "Has data dictionary" * #plink-type-associated-files "Plink-type associated files" +Extension: FileFormat +Id: file-format +Title: "The file format used" +Description: "The file format used" +* insert SetContext(DocumentReference) +* value[x] only CodeableConcept +* valueCodeableConcept from $edam (extensible) + Extension: FileSize Id: file-size Title: "The size of the file, e.g., in bytes." Description: "The size of the file, e.g., in bytes." +* insert SetContext(DocumentReference) * value[x] only Quantity * valueQuantity ^short = "Indicate the size of the file in reference" @@ -56,9 +65,36 @@ Extension: ContentVersion Id: content-version Title: "Version of the contents of the file" Description: "Version of the contents of the file" +* insert SetContext(DocumentReference) * value[x] only string * valueString ^short = "Indicate the version (e.g., V1) for the contents of this file" +Extension: HashValue +Id: hash-value +Title: "Value of hashing the file" +Description: "Value of hashing the file" +* insert SetContext(DocumentReference.extension) +* value[x] only string +* valueString ^short = "Value of hashing the file" + +Extension: HashType +Id: hash-type +Title: "Algorithm used to calculate the hash (and size, where applicable)" +Description: "Algorithm used to calculate the hash (and size, where applicable)" +* insert SetContext(DocumentReference.extension) +* value[x] only code +* valueCode ^short = "Algorithm used to calculate the hash (and size, where applicable)" + +Extension: HashExtension +Id: hash-extension +Title: "Provides a list of hashes for confirming file transfers" +Description: "Provides a list of hashes for confirming file transfers" +* insert SetContext(List) +* extension contains HashValue named hash-value 1..1 +* extension[hash-value] ^short = "Value of hashing the file" +* extension contains HashType named hash-type 1..1 +* extension[hash-type] ^short = "Algorithm used to calculate the hash (and size, where applicable)" + Profile: NcpiFile Parent: DocumentReference Id: ncpi-file @@ -70,13 +106,17 @@ Description: "Information about a file related to a research participant" * identifier ^short = "A related external file ID" * subject 0..1 * subject ^short = "The participant(s) for whom this file contains data (i.e., ParticipantID)" +* extension contains FileFormat named file-format 1..1 +* extension[file-format] ^short = "The file format used (EDAM is preferred)" * extension contains FileSize named file-size 1..1 * extension[file-size] ^short = "Indicate the size of the file in reference" * extension contains ContentVersion named content-version 0..1 * extension[content-version] ^short = "The version of the content in the file" +* extension contains HashExtension named hash 0..* * description 0..1 * description ^short = "A description of the file" -* type 0..1 +* type 0..1 +* type from $edam (extensible) * type ^short = "The type of data contained in this file." /* From 593e4ca0ca7ef2dd1bdaa4e5b4ca27e93e6c79ee Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Tue, 30 Jul 2024 14:17:11 -0400 Subject: [PATCH 05/10] :sparkles: finish example, profile, introduce info page --- input/fsh/examples/files.fsh | 4 +- input/fsh/modules/files.fsh | 40 ++++++++++++------- ...ition-SharedDataModelResearchFile-intro.md | 11 +++++ 3 files changed, 39 insertions(+), 16 deletions(-) create mode 100644 input/pagecontent/StructureDefinition-SharedDataModelResearchFile-intro.md diff --git a/input/fsh/examples/files.fsh b/input/fsh/examples/files.fsh index 7b1dc1992..1327ecc9d 100644 --- a/input/fsh/examples/files.fsh +++ b/input/fsh/examples/files.fsh @@ -8,8 +8,8 @@ Description: "Use case of file information from CBTN" * description = "Annotated Variant Call" * type = $edam#operation_3227 "Variant calling" * status = #current -* content[0].attachment.url = "s3://kf-strides-study-us-east-1-prd-sd-54g4wg4r/harmonized-data/family-variants/155bb529-2e7b-474f-ba24-cd0656d5f3d0.CGP.filtered.deNovo.vep.vcf.gz" -/** content[0].profile = $ncpi-data-access-type#controlled "Controlled"*/ +* content.attachment.url = "s3://kf-strides-study-us-east-1-prd-sd-54g4wg4r/harmonized-data/family-variants/155bb529-2e7b-474f-ba24-cd0656d5f3d0.CGP.filtered.deNovo.vep.vcf.gz" +* extension[location-access].valueReference = Reference(NcpiResearchAccessPolicy.accessType) * extension[file-format].valueCodeableConcept.coding = $edam#format_3016 "VCF" * extension[file-size] * valueQuantity diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh index e0b42fc88..af2b08cf7 100644 --- a/input/fsh/modules/files.fsh +++ b/input/fsh/modules/files.fsh @@ -6,7 +6,7 @@ Logical: CdmFile Id: SharedDataModelFile Title: "Shared Data Model for File" Description: "The **Shared Data Model for File**" -* participantID 1..* reference "The participant(s) for whom this file contains data" +* participantID 1..1 reference "The participant(s) for whom this file contains data" * fileExternalID 0..1 string "A related identifier of this file" * format 1..1 code "The file format used" * location 1..* List "List of locations where this data can be accessed" @@ -23,6 +23,8 @@ Description: "The **Shared Data Model for File**" * relatedFile.file 0..1 reference "The file to which this related file is related" * relatedFile.type 0..1 code "The relationship of the file to the parent file in reference" +/* TODO Add Related file to metadata - AH 2024-07-30 */ + CodeSystem: HashTypeCS Id: example-hash-type-code-system Title: "Hash Types Code System" @@ -53,6 +55,14 @@ Description: "The file format used" * value[x] only CodeableConcept * valueCodeableConcept from $edam (extensible) +Extension: LocationAccess +Id: location-access +Title: "If present, only those under the specific Access Policy can access the file in this location." +Description: "If present, only those under the specific Access Policy can access the file in this location." +* insert SetContext(DocumentReference.content) +* value[x] only Reference +* valueReference ^short = "If present, only those under the specific Access Policy can access the file in this location." + Extension: FileSize Id: file-size Title: "The size of the file, e.g., in bytes." @@ -95,6 +105,8 @@ Description: "Provides a list of hashes for confirming file transfers" * extension contains HashType named hash-type 1..1 * extension[hash-type] ^short = "Algorithm used to calculate the hash (and size, where applicable)" +/** TODO Add Related file to metadata - AH 2024-07-30 */ + Profile: NcpiFile Parent: DocumentReference Id: ncpi-file @@ -102,23 +114,23 @@ Title: "NCPI File" Description: "Information about a file related to a research participant" * ^version = "0.0.1" * ^status = #draft -* identifier 0..* +* identifier 0..* /*File External ID*/ * identifier ^short = "A related external file ID" -* subject 0..1 +* subject 0..1 /*Participant*/ * subject ^short = "The participant(s) for whom this file contains data (i.e., ParticipantID)" -* extension contains FileFormat named file-format 1..1 +* extension contains FileFormat named file-format 1..1 /*File Format*/ * extension[file-format] ^short = "The file format used (EDAM is preferred)" -* extension contains FileSize named file-size 1..1 +* content.attachment.url 1..1 /*Location uri*/ +* content.attachment.url ^short = "The URI at which this data can be accessed" +* extension contains LocationAccess named location-access 0..* /*Location Access Policy*/ +* extension[location-access] ^short = "If present, only those under the specific Access Policy can access the file in this location." +* extension contains FileSize named file-size 1..1 /*File Size*/ * extension[file-size] ^short = "Indicate the size of the file in reference" -* extension contains ContentVersion named content-version 0..1 +* extension contains HashExtension named hash 0..* /*Hash (contains type and value)*/ +* extension contains ContentVersion named content-version 0..1 /*Content Version*/ * extension[content-version] ^short = "The version of the content in the file" -* extension contains HashExtension named hash 0..* -* description 0..1 +* description 0..1 /*Description*/ * description ^short = "A description of the file" -* type 0..1 +* type 0..1 /*File Type*/ * type from $edam (extensible) -* type ^short = "The type of data contained in this file." - -/* -CodeSystem for EDAM-- link it as an external code system? -*/ \ No newline at end of file +* type ^short = "The type of data contained in this file." \ No newline at end of file diff --git a/input/pagecontent/StructureDefinition-SharedDataModelResearchFile-intro.md b/input/pagecontent/StructureDefinition-SharedDataModelResearchFile-intro.md new file mode 100644 index 000000000..e8613a977 --- /dev/null +++ b/input/pagecontent/StructureDefinition-SharedDataModelResearchFile-intro.md @@ -0,0 +1,11 @@ +### NCPI File +#### Introduction +Files are a common research product. In this straightforward representation, we provide basic details of the file and how to access it. Details about what is contained in the file or how the content was generated should be described with other entities, such as data dictionaries, summaries, or assays. + +#### File Definitions +File contains basic file metadata about the file location and contents. Files are typically associated with one or more participants, though they can also include general study documents. The file content may have different access control restrictions when compared to this entity, which is only the file metadata. + +There can be multiple file location references, for example DRS and cloud storage references, though the access approaches for those locations should be reasonably apparent through the Access Policy for the file content. + +#### Example +If a data file is ONLY accessible through DRS, the underlying bucket locations should not be included here as no user would be able to access them directly. However, if there are multiple Access Policies that provide routes to access the data through different URIs, those can be included. Controlled access release via DRS with a consortium access model permitting direct bucket access could both be stated here to permit consistent reference to the File irrespective of the access mechanism. \ No newline at end of file From 677981a53b78a03906196d883f13529a48972eff Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Tue, 30 Jul 2024 14:34:45 -0400 Subject: [PATCH 06/10] :recycle: restore completeness after making push/pull mistake --- input/fsh/examples/files.fsh | 20 ++++++++++++ input/fsh/modules/files.fsh | 63 +++++++++++++++++++++++++++++++++--- 2 files changed, 78 insertions(+), 5 deletions(-) diff --git a/input/fsh/examples/files.fsh b/input/fsh/examples/files.fsh index e69de29bb..e398be7d2 100644 --- a/input/fsh/examples/files.fsh +++ b/input/fsh/examples/files.fsh @@ -0,0 +1,20 @@ +Instance: PT-006SP660 +InstanceOf: NcpiFile +Title: "Example file based on CBTN" +Usage: #example +Description: "Use case of file information from CBTN" +* identifier.value = "PT_006SP660" +* subject = Reference(GF_6BAD9S7D) +* description = "Annotated Variant Call" +* type = $edam#operation_3227 "Variant calling" +* status = #current +* content.attachment.url = "s3://kf-strides-study-us-east-1-prd-sd-54g4wg4r/harmonized-data/family-variants/155bb529-2e7b-474f-ba24-cd0656d5f3d0.CGP.filtered.deNovo.vep.vcf.gz" +* extension[location-access].valueReference = Reference(NcpiResearchAccessPolicy.accessType) +* extension[file-format].valueCodeableConcept.coding = $edam#format_3016 "VCF" +* extension[file-size] + * valueQuantity + * value = 1044770380 + * unit = "bytes" +* extension[hash] + * extension[hash-value].valueString = "8f107912d862cf91fbfb77bf9c1bab36-4" + * extension[hash-type].valueCode = #etag \ No newline at end of file diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh index d743c5643..af2b08cf7 100644 --- a/input/fsh/modules/files.fsh +++ b/input/fsh/modules/files.fsh @@ -20,8 +20,10 @@ Description: "The **Shared Data Model for File**" * description 0..1 string "A description of the file" * type 1..1 code "The type of data contained in this file. Should be as detailed as possible, e.g., Whole Exome Variant Calls." * relatedFile 0..1 List "Provides a reference to another file that is related to this one" -* relatedFile.File 0..1 reference "The file to which this related file is related" -* relatedFile.Type 0..1 code "The relationship of the file to the parent file in reference" +* relatedFile.file 0..1 reference "The file to which this related file is related" +* relatedFile.type 0..1 code "The relationship of the file to the parent file in reference" + +/* TODO Add Related file to metadata - AH 2024-07-30 */ CodeSystem: HashTypeCS Id: example-hash-type-code-system @@ -45,6 +47,22 @@ Description: "Explains the relationship of this file to the file of reference" * #has_data_dictionary "Has data dictionary" * #plink-type-associated-files "Plink-type associated files" +Extension: FileFormat +Id: file-format +Title: "The file format used" +Description: "The file format used" +* insert SetContext(DocumentReference) +* value[x] only CodeableConcept +* valueCodeableConcept from $edam (extensible) + +Extension: LocationAccess +Id: location-access +Title: "If present, only those under the specific Access Policy can access the file in this location." +Description: "If present, only those under the specific Access Policy can access the file in this location." +* insert SetContext(DocumentReference.content) +* value[x] only Reference +* valueReference ^short = "If present, only those under the specific Access Policy can access the file in this location." + Extension: FileSize Id: file-size Title: "The size of the file, e.g., in bytes." @@ -61,6 +79,34 @@ Description: "Version of the contents of the file" * value[x] only string * valueString ^short = "Indicate the version (e.g., V1) for the contents of this file" +Extension: HashValue +Id: hash-value +Title: "Value of hashing the file" +Description: "Value of hashing the file" +* insert SetContext(DocumentReference.extension) +* value[x] only string +* valueString ^short = "Value of hashing the file" + +Extension: HashType +Id: hash-type +Title: "Algorithm used to calculate the hash (and size, where applicable)" +Description: "Algorithm used to calculate the hash (and size, where applicable)" +* insert SetContext(DocumentReference.extension) +* value[x] only code +* valueCode ^short = "Algorithm used to calculate the hash (and size, where applicable)" + +Extension: HashExtension +Id: hash-extension +Title: "Provides a list of hashes for confirming file transfers" +Description: "Provides a list of hashes for confirming file transfers" +* insert SetContext(List) +* extension contains HashValue named hash-value 1..1 +* extension[hash-value] ^short = "Value of hashing the file" +* extension contains HashType named hash-type 1..1 +* extension[hash-type] ^short = "Algorithm used to calculate the hash (and size, where applicable)" + +/** TODO Add Related file to metadata - AH 2024-07-30 */ + Profile: NcpiFile Parent: DocumentReference Id: ncpi-file @@ -72,12 +118,19 @@ Description: "Information about a file related to a research participant" * identifier ^short = "A related external file ID" * subject 0..1 /*Participant*/ * subject ^short = "The participant(s) for whom this file contains data (i.e., ParticipantID)" -* extension contains FileSize named file-size 1..1 +* extension contains FileFormat named file-format 1..1 /*File Format*/ +* extension[file-format] ^short = "The file format used (EDAM is preferred)" +* content.attachment.url 1..1 /*Location uri*/ +* content.attachment.url ^short = "The URI at which this data can be accessed" +* extension contains LocationAccess named location-access 0..* /*Location Access Policy*/ +* extension[location-access] ^short = "If present, only those under the specific Access Policy can access the file in this location." +* extension contains FileSize named file-size 1..1 /*File Size*/ * extension[file-size] ^short = "Indicate the size of the file in reference" * extension contains HashExtension named hash 0..* /*Hash (contains type and value)*/ * extension contains ContentVersion named content-version 0..1 /*Content Version*/ * extension[content-version] ^short = "The version of the content in the file" -* description 0..1 +* description 0..1 /*Description*/ * description ^short = "A description of the file" -* type 0..1 +* type 0..1 /*File Type*/ +* type from $edam (extensible) * type ^short = "The type of data contained in this file." \ No newline at end of file From a742247a4a66dd04a9d7d2664b48a12c8bfede8e Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Tue, 30 Jul 2024 16:07:59 -0400 Subject: [PATCH 07/10] :recycle: make EDAM for file format a value set --- input/fsh/modules/files.fsh | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh index af2b08cf7..006f39121 100644 --- a/input/fsh/modules/files.fsh +++ b/input/fsh/modules/files.fsh @@ -47,6 +47,12 @@ Description: "Explains the relationship of this file to the file of reference" * #has_data_dictionary "Has data dictionary" * #plink-type-associated-files "Plink-type associated files" +ValueSet: EDAMOntologyTerms +Id: edam-ontology-terms +Title: "Enumerations for the EDAM ontology" +Description: "Enumerations for the EDAM ontology" +* include codes from system $edam + Extension: FileFormat Id: file-format Title: "The file format used" @@ -132,5 +138,5 @@ Description: "Information about a file related to a research participant" * description 0..1 /*Description*/ * description ^short = "A description of the file" * type 0..1 /*File Type*/ -* type from $edam (extensible) +* type from edam-ontology-terms (extensible) * type ^short = "The type of data contained in this file." \ No newline at end of file From cd3a66622e841b8f01de6dfb44c4eb15fcea27bf Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Thu, 1 Aug 2024 16:10:47 -0400 Subject: [PATCH 08/10] :memo: added information page about logical model --- .../StructureDefinition-ncpi-file-intro.md | 37 +++++++++++++++++++ 1 file changed, 37 insertions(+) create mode 100644 input/pagecontent/StructureDefinition-ncpi-file-intro.md diff --git a/input/pagecontent/StructureDefinition-ncpi-file-intro.md b/input/pagecontent/StructureDefinition-ncpi-file-intro.md new file mode 100644 index 000000000..130d1c911 --- /dev/null +++ b/input/pagecontent/StructureDefinition-ncpi-file-intro.md @@ -0,0 +1,37 @@ +#### Key Guidelines +The NCPI File profile is based on the standard resource type, [DocumentReference](https://hl7.org/fhir/r4/documentreference.html) and is intended to represent the files associated with a participant in a research study. + +##### Added Profile Restrictions +In order to ensure that our resources are interoperable across studies, we have employed a number of restrictions that should make consuming Patient resources more consistent. + + +* participantID **should** be a globally unique identifier associated with the patient. This practice is intended to make constructing queries for the same patient compatible across different servers (such as QA vs PROD) but also to make the resource URLs more meaningful. + +* fileExternalID **should** have all appropriate Identifiers with a meaningful system/value pair. Such identifiers may include DbGAP accession IDs, global and external IDs, etc. + +* format and relatedFile.type **should** use [EDAM](https://edamontology.org/) terminology (i.e., codes) when available. Othe file type code systems are allowed if a suitable EDAM code does not exist. + + +#### Recommended Practices +TDOD: Write Recommended Practices + +##### FHIR Mappings +The following fields from the shared data model are to be mapped to the NCPI File as shown below: + +| **Logical Model Property** | **Cardinality** | **NCPI File Mapping** | **Usage Guidance** | **Notes**| +participantID|1..1|VSReference(5.1.0)|The participant(s) for whom this file contains data| +fileExternalID|0..1|string|A related identifier of this file| +format|1..1|code|The file format used| +location|1..*|List|List of locations where this data can be accessed| +location.uri|1..1|uri|The URI at which this data can be accessed| +location.accessPolicy|0..*|VSReference(5.1.0)|If present, only those under the specific Access Policy can access the file in this location.| +fileSize|1..1|Quantity|The size of the file, e.g., in bytes.| +hash|0..*|List|Provides a list of hashes for confirming file transfers| +hash.type|0..1|code|Algorithm used to calculate the hash (and size, where applicable)| +hash.value|1..1|string|Value of hashing the file| +contentVersion|0..1|string|Version of the file content| +description|0..1|string|A description of the file| +type|1..1|code|The type of data contained in this file. Should be as detailed as possible, e.g., Whole Exome Variant Calls.| +relatedFile|0..1|List|Provides a reference to another file that is related to this one| +relatedFile.file|0..1|VSReference(5.1.0)|The file to which this related file is related| +relatedFile.type|0..1|code|The relationship of the file to the parent file in reference| From b48abde749ab7fe32d147fa0ffb904ba1f0356ee Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Thu, 1 Aug 2024 19:21:34 -0400 Subject: [PATCH 09/10] :adhesive_bandage: correct intro table and redeploy IG preview --- input/fsh/examples/files.fsh | 6 ++-- input/fsh/modules/files.fsh | 2 +- .../StructureDefinition-ncpi-file-intro.md | 34 +++++++++---------- 3 files changed, 22 insertions(+), 20 deletions(-) diff --git a/input/fsh/examples/files.fsh b/input/fsh/examples/files.fsh index e398be7d2..d9ce16110 100644 --- a/input/fsh/examples/files.fsh +++ b/input/fsh/examples/files.fsh @@ -7,9 +7,11 @@ Description: "Use case of file information from CBTN" * subject = Reference(GF_6BAD9S7D) * description = "Annotated Variant Call" * type = $edam#operation_3227 "Variant calling" +* extension[content-version].valueString = "V1" * status = #current -* content.attachment.url = "s3://kf-strides-study-us-east-1-prd-sd-54g4wg4r/harmonized-data/family-variants/155bb529-2e7b-474f-ba24-cd0656d5f3d0.CGP.filtered.deNovo.vep.vcf.gz" -* extension[location-access].valueReference = Reference(NcpiResearchAccessPolicy.accessType) +* content[+] + * attachment.url = "s3://kf-strides-study-us-east-1-prd-sd-54g4wg4r/harmonized-data/family-variants/155bb529-2e7b-474f-ba24-cd0656d5f3d0.CGP.filtered.deNovo.vep.vcf.gz" + * extension[location-access].valueReference = Reference(kf-gru-dac-consent) * extension[file-format].valueCodeableConcept.coding = $edam#format_3016 "VCF" * extension[file-size] * valueQuantity diff --git a/input/fsh/modules/files.fsh b/input/fsh/modules/files.fsh index 006f39121..9e0921ce2 100644 --- a/input/fsh/modules/files.fsh +++ b/input/fsh/modules/files.fsh @@ -105,7 +105,7 @@ Extension: HashExtension Id: hash-extension Title: "Provides a list of hashes for confirming file transfers" Description: "Provides a list of hashes for confirming file transfers" -* insert SetContext(List) +* insert SetContext(DocumentReference) * extension contains HashValue named hash-value 1..1 * extension[hash-value] ^short = "Value of hashing the file" * extension contains HashType named hash-type 1..1 diff --git a/input/pagecontent/StructureDefinition-ncpi-file-intro.md b/input/pagecontent/StructureDefinition-ncpi-file-intro.md index 130d1c911..cc023d3d2 100644 --- a/input/pagecontent/StructureDefinition-ncpi-file-intro.md +++ b/input/pagecontent/StructureDefinition-ncpi-file-intro.md @@ -18,20 +18,20 @@ TDOD: Write Recommended Practices ##### FHIR Mappings The following fields from the shared data model are to be mapped to the NCPI File as shown below: -| **Logical Model Property** | **Cardinality** | **NCPI File Mapping** | **Usage Guidance** | **Notes**| -participantID|1..1|VSReference(5.1.0)|The participant(s) for whom this file contains data| -fileExternalID|0..1|string|A related identifier of this file| -format|1..1|code|The file format used| -location|1..*|List|List of locations where this data can be accessed| -location.uri|1..1|uri|The URI at which this data can be accessed| -location.accessPolicy|0..*|VSReference(5.1.0)|If present, only those under the specific Access Policy can access the file in this location.| -fileSize|1..1|Quantity|The size of the file, e.g., in bytes.| -hash|0..*|List|Provides a list of hashes for confirming file transfers| -hash.type|0..1|code|Algorithm used to calculate the hash (and size, where applicable)| -hash.value|1..1|string|Value of hashing the file| -contentVersion|0..1|string|Version of the file content| -description|0..1|string|A description of the file| -type|1..1|code|The type of data contained in this file. Should be as detailed as possible, e.g., Whole Exome Variant Calls.| -relatedFile|0..1|List|Provides a reference to another file that is related to this one| -relatedFile.file|0..1|VSReference(5.1.0)|The file to which this related file is related| -relatedFile.type|0..1|code|The relationship of the file to the parent file in reference| +| **Logical Model Property** | **Cardinality** | **NCPI FHIR Mapping** | **Usage Guidance** | **Notes**| +participantID|1..1|identifier.value|The participant(s) for whom this file contains data| +fileExternalID|0..1|subject|A related identifier of this file| +format|1..1|codextension[file-format].valueCodeableConcept.codinge|The file format used| +location|1..*|content|List of locations where this data can be accessed| +location.uri|1..1|content.attachment.url|The URI at which this data can be accessed| +location.accessPolicy|0..*|content.extension[location-access].valueReference|If present, only those under the specific Access Policy can access the file in this location.| +fileSize|1..1|extension[file-size].valueQuantity.value, extension[file-size].valueQuantity.unit|The size of the file, e.g., in bytes.| +hash|0..*|extension[hash]|Provides a list of hashes for confirming file transfers| +hash.type|0..1|extension[hash-type].valueCode|Algorithm used to calculate the hash (and size, where applicable)| +hash.value|1..1|extension[hash-value].valueString|Value of hashing the file| +contentVersion|0..1|extension[content-version].valueString|Version of the file content| +description|0..1|description|A description of the file| +type|1..1|type|The type of data contained in this file. Should be as detailed as possible, e.g., Whole Exome Variant Calls.| +relatedFile|0..1|TODO|Provides a reference to another file that is related to this one| +relatedFile.file|0..1|TODO|The file to which this related file is related| +relatedFile.type|0..1|TODO|The relationship of the file to the parent file in reference| From f4de841b3e1ef38e9c3c474f21cecce0b25c3530 Mon Sep 17 00:00:00 2001 From: Ann Holmes Date: Thu, 8 Aug 2024 08:08:28 -0400 Subject: [PATCH 10/10] :memo: correct file name inconsistency and add notation about File Metadata connections --- ...o.md => StructureDefinition-SharedDataModelFile-intro.md} | 0 input/pagecontent/StructureDefinition-ncpi-file-intro.md | 5 ++++- 2 files changed, 4 insertions(+), 1 deletion(-) rename input/pagecontent/{StructureDefinition-SharedDataModelResearchFile-intro.md => StructureDefinition-SharedDataModelFile-intro.md} (100%) diff --git a/input/pagecontent/StructureDefinition-SharedDataModelResearchFile-intro.md b/input/pagecontent/StructureDefinition-SharedDataModelFile-intro.md similarity index 100% rename from input/pagecontent/StructureDefinition-SharedDataModelResearchFile-intro.md rename to input/pagecontent/StructureDefinition-SharedDataModelFile-intro.md diff --git a/input/pagecontent/StructureDefinition-ncpi-file-intro.md b/input/pagecontent/StructureDefinition-ncpi-file-intro.md index cc023d3d2..414123d81 100644 --- a/input/pagecontent/StructureDefinition-ncpi-file-intro.md +++ b/input/pagecontent/StructureDefinition-ncpi-file-intro.md @@ -21,7 +21,7 @@ The following fields from the shared data model are to be mapped to the NCPI Fil | **Logical Model Property** | **Cardinality** | **NCPI FHIR Mapping** | **Usage Guidance** | **Notes**| participantID|1..1|identifier.value|The participant(s) for whom this file contains data| fileExternalID|0..1|subject|A related identifier of this file| -format|1..1|codextension[file-format].valueCodeableConcept.codinge|The file format used| +format|1..1|extension[file-format].valueCodeableConcept.coding|The file format used| location|1..*|content|List of locations where this data can be accessed| location.uri|1..1|content.attachment.url|The URI at which this data can be accessed| location.accessPolicy|0..*|content.extension[location-access].valueReference|If present, only those under the specific Access Policy can access the file in this location.| @@ -35,3 +35,6 @@ type|1..1|type|The type of data contained in this file. Should be as detailed as relatedFile|0..1|TODO|Provides a reference to another file that is related to this one| relatedFile.file|0..1|TODO|The file to which this related file is related| relatedFile.type|0..1|TODO|The relationship of the file to the parent file in reference| + +##### Note on Related Files +The fields of related files are under the scope of file metadata and that module has yet to be written at the time of the definition of Files. Once a file metadata module is written, the FHIR mappings for related file in this IG should be updated. \ No newline at end of file