From b33671035a40a03db8d646caae6702b359cb7e9a Mon Sep 17 00:00:00 2001 From: Heinrich Lukas Weil Date: Tue, 23 Jan 2024 17:48:25 +0100 Subject: [PATCH 1/4] start inclusion of data selectors with some links --- ISA-XLSX.md | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/ISA-XLSX.md b/ISA-XLSX.md index bf4caf6..82edff3 100644 --- a/ISA-XLSX.md +++ b/ISA-XLSX.md @@ -620,15 +620,27 @@ Each annotation table sheet MUST contain at most one `Input` and at most one `Ou - An `Extract Material` MUST be indicated with the node type `Material Name`. -- An `Image File` MUST be indicated with the node type `Image File`. +- A `Data` object MUST be indicated with the node type `Data`. -- A `Raw Data File` MUST be indicated with the node type `Raw Data File`. +`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity. -- A `Derived Data File` MUST be indicated with the node type `Derived Data File`. +The `Data` node type MUST correspond to a relevant data resource location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns. If the annotation of the `Data` node refers not to the complete resource, but a part of it, a `Selector` MAY added. This Selector MUST be separated from the location using a `#`— with no whitespace between: `location#selector`. -`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity. +`Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml) +SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL. + +## Examples + +In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Raw Data file`. `Assay2` references this `Data file` for producing a new `Derived Data File` + +Use of `general pattern` relative paths from the arc root folder: + +`assays/Assay1/isa.assay.xlsx`: -`Image File`, `Raw Data File` or `Derived Data File` node types MUST correspond to a relevant file location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns. +| Input [Sample Name] | Output [Data] | Output Data Format | Output Data Selector | +|-------------|---------------------------------|----------------------------------|--| +| input1 | result.csv#col=1 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 | +| input2 | result.csv#col=2 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 | ## Protocol Columns From 91644edfd2d257d828899ff52f24dcf1a713b60f Mon Sep 17 00:00:00 2001 From: Heinrich Lukas Weil Date: Wed, 24 Jan 2024 09:27:38 +0100 Subject: [PATCH 2/4] finish up first draft of new data columns --- ISA-XLSX.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/ISA-XLSX.md b/ISA-XLSX.md index 82edff3..6f208a6 100644 --- a/ISA-XLSX.md +++ b/ISA-XLSX.md @@ -624,11 +624,15 @@ Each annotation table sheet MUST contain at most one `Input` and at most one `Ou `Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity. -The `Data` node type MUST correspond to a relevant data resource location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns. If the annotation of the `Data` node refers not to the complete resource, but a part of it, a `Selector` MAY added. This Selector MUST be separated from the location using a `#`— with no whitespace between: `location#selector`. +The `Data` node type MUST correspond to a relevant data resource location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns. If the annotation of the `Data` node refers not to the complete resource, but a part of it, a `Selector` MAY be added. This Selector MUST be separated from the resource location using a `#`— with no whitespace between: `location#selector`. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by [W3](https://www.w3.org/TR/annotation-model/#fragment-selector). -`Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml) +The format of the data resource MAY be further qualified using a `Data Format` column. The `Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml) SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL. +The format and usage info about the selector MAY be further qualified using a `Data Selector` column. The `Data Selector` SHOULD point to a web resource containing instructions about how the selector is formatted and how it should be interpreted. + + + ## Examples In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Raw Data file`. `Assay2` references this `Data file` for producing a new `Derived Data File` @@ -637,7 +641,7 @@ Use of `general pattern` relative paths from the arc root folder: `assays/Assay1/isa.assay.xlsx`: -| Input [Sample Name] | Output [Data] | Output Data Format | Output Data Selector | +| Input [Sample Name] | Output [Data] | Data Format | Data Selector | |-------------|---------------------------------|----------------------------------|--| | input1 | result.csv#col=1 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 | | input2 | result.csv#col=2 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 | From 6a4eefa1dc9afe20c1e87f73a86b1ed5cc824282 Mon Sep 17 00:00:00 2001 From: Heinrich Lukas Weil Date: Wed, 24 Jan 2024 09:52:09 +0100 Subject: [PATCH 3/4] adjust examples according to data specification changes --- ARC specification.md | 8 +++++--- ISA-XLSX.md | 8 +++----- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index b4b61d5..1ac80d7 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -245,19 +245,21 @@ All metadata references to files or directories located inside the ARC MUST foll #### Examples -In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Raw Data file`. `Assay2` references this `Data file` for producing a new `Derived Data File` +##### General Pattern + +In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Data`. `Assay2` references this `Data` for producing a new `Data`. Use of `general pattern` relative paths from the arc root folder: `assays/Assay1/isa.assay.xlsx`: -| Input [Source Name] | Parameter[Instrument model] | Output [Raw Data File] | +| Input [Source Name] | Parameter[Instrument model] | Output [Data] | |-------------|---------------------------------|----------------------------------| | input | Bruker 500 Avance | assays/Assay1/dataset/measurement.txt | `assays/Assay2/isa.assay.xlsx`: -| Input [Raw Data File] | Parameter[script file] | Output [Derived Data File] | +| Input [Data] | Parameter[script file] | Output [Data] | |----------------------------------|---------------------------------|----------------------------------| | assays/Assay1/dataset/measurement.txt | assays/Assay2/dataset/script.sh | assays/Assay2/dataset/result.txt | diff --git a/ISA-XLSX.md b/ISA-XLSX.md index 6f208a6..2412e68 100644 --- a/ISA-XLSX.md +++ b/ISA-XLSX.md @@ -629,17 +629,15 @@ The `Data` node type MUST correspond to a relevant data resource location, follo The format of the data resource MAY be further qualified using a `Data Format` column. The `Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml) SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL. -The format and usage info about the selector MAY be further qualified using a `Data Selector` column. The `Data Selector` SHOULD point to a web resource containing instructions about how the selector is formatted and how it should be interpreted. - +The format and usage info about the Selector MAY be further qualified using a `Data Selector` column. The `Data Selector` SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted. ## Examples -In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Raw Data file`. `Assay2` references this `Data file` for producing a new `Derived Data File` +### Data Location and Selector -Use of `general pattern` relative paths from the arc root folder: +In this example, there is a measurement of two `Samples`, namely `input1` and `input2`. The values measured are both written into the same data resource in the location `result.csv`, whichs formatting is tabular, according to the `Data Format` being `text/csv`. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a `#`), namely `col=1` and `col=2`. The specification about the formatting of these selectors can be found in the provided link, namely `https://datatracker.ietf.org/`. -`assays/Assay1/isa.assay.xlsx`: | Input [Sample Name] | Output [Data] | Data Format | Data Selector | |-------------|---------------------------------|----------------------------------|--| From 1f6e246df6d7c1b98c5d4a3206340d130cde6a39 Mon Sep 17 00:00:00 2001 From: Heinrich Lukas Weil Date: Wed, 24 Jan 2024 10:12:08 +0100 Subject: [PATCH 4/4] change Data Selector to Data Selector Format --- ISA-XLSX.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ISA-XLSX.md b/ISA-XLSX.md index 2412e68..04d8e8a 100644 --- a/ISA-XLSX.md +++ b/ISA-XLSX.md @@ -629,7 +629,7 @@ The `Data` node type MUST correspond to a relevant data resource location, follo The format of the data resource MAY be further qualified using a `Data Format` column. The `Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml) SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL. -The format and usage info about the Selector MAY be further qualified using a `Data Selector` column. The `Data Selector` SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted. +The format and usage info about the Selector MAY be further qualified using a `Data Selector Format` column. The `Data Selector Format` SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted. ## Examples @@ -639,7 +639,7 @@ The format and usage info about the Selector MAY be further qualified using a `D In this example, there is a measurement of two `Samples`, namely `input1` and `input2`. The values measured are both written into the same data resource in the location `result.csv`, whichs formatting is tabular, according to the `Data Format` being `text/csv`. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a `#`), namely `col=1` and `col=2`. The specification about the formatting of these selectors can be found in the provided link, namely `https://datatracker.ietf.org/`. -| Input [Sample Name] | Output [Data] | Data Format | Data Selector | +| Input [Sample Name] | Output [Data] | Data Format | Data Selector Format | |-------------|---------------------------------|----------------------------------|--| | input1 | result.csv#col=1 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 | | input2 | result.csv#col=2 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |