Merge pull request #93 from nfdi4plants/selector

Rework Data Nodes
nfdi4plants · Feb 8, 2024 · 604a083 · 604a083
2 parents 01d5483 + 1f6e246
commit 604a083
Show file tree

Hide file tree

Showing 2 changed files with 24 additions and 8 deletions.
diff --git a/ARC specification.md b/ARC specification.md
@@ -255,19 +255,21 @@ All metadata references to files or directories located inside the ARC MUST foll
 
 ### Examples
 
-In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Raw Data file`. `Assay2` references this `Data file` for producing a new `Derived Data File`
+##### General Pattern
+
+In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Data`. `Assay2` references this `Data` for producing a new `Data`.
 
 Use of `general pattern` relative paths from the arc root folder:
 
 `assays/Assay1/isa.assay.xlsx`:
 
-| Input [Source Name] | Parameter[Instrument model]          | Output [Raw Data File] | 
+| Input [Source Name] | Parameter[Instrument model]          | Output [Data] | 
 |-------------|---------------------------------|----------------------------------|
 | input       | Bruker 500 Avance | assays/Assay1/dataset/measurement.txt |
 
 `assays/Assay2/isa.assay.xlsx`:
 
-| Input [Raw Data File] | Parameter[script file]          | Output [Derived Data File] |
+| Input [Data] | Parameter[script file]          | Output [Data] |
 |----------------------------------|---------------------------------|----------------------------------|
 | assays/Assay1/dataset/measurement.txt | assays/Assay2/dataset/script.sh | assays/Assay2/dataset/result.txt |
 

diff --git a/ISA-XLSX.md b/ISA-XLSX.md
@@ -620,15 +620,29 @@ Each annotation table sheet MUST contain at most one `Input` and at most one `Ou
 
 - An `Extract Material` MUST be indicated with the node type `Material Name`.
 
-- An `Image File` MUST be indicated with the node type `Image File`.
+- A `Data` object MUST be indicated with the node type `Data`.
 
-- A `Raw Data File` MUST be indicated with the node type `Raw Data File`.
+`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.
 
-- A `Derived Data File` MUST be indicated with the node type `Derived Data File`.
+The `Data` node type MUST correspond to a relevant data resource location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns. If the annotation of the `Data` node refers not to the complete resource, but a part of it, a `Selector` MAY be added. This Selector MUST be separated from the resource location using a `#`— with no whitespace between: `location#selector`. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by [W3](https://www.w3.org/TR/annotation-model/#fragment-selector).
+
+The format of the data resource MAY be further qualified using a `Data Format` column. The `Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml)
+SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.
+
+The format and usage info about the Selector MAY be further qualified using a `Data Selector Format` column. The `Data Selector Format` SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.
+
+
+## Examples
+
+### Data Location and Selector
+
+In this example, there is a measurement of two `Samples`, namely `input1` and `input2`. The values measured are both written into the same data resource in the location `result.csv`, whichs formatting is tabular, according to the `Data Format` being `text/csv`. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a `#`), namely `col=1` and `col=2`. The specification about the formatting of these selectors can be found in the provided link, namely `https://datatracker.ietf.org/`.
 
-`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.
 
-`Image File`, `Raw Data File` or `Derived Data File` node types MUST correspond to a relevant file location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns.
+| Input [Sample Name] | Output [Data]          | Data Format | Data Selector Format | 
+|-------------|---------------------------------|----------------------------------|--|
+| input1       | result.csv#col=1 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |
+| input2       | result.csv#col=2 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |
 
 ## Protocol Columns