Skip to content

Commit

Permalink
Merge pull request #93 from nfdi4plants/selector
Browse files Browse the repository at this point in the history
Rework Data Nodes
  • Loading branch information
HLWeil authored Feb 8, 2024
2 parents 01d5483 + 1f6e246 commit 604a083
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 8 deletions.
8 changes: 5 additions & 3 deletions ARC specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,19 +255,21 @@ All metadata references to files or directories located inside the ARC MUST foll

### Examples

In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Raw Data file`. `Assay2` references this `Data file` for producing a new `Derived Data File`
##### General Pattern

In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Data`. `Assay2` references this `Data` for producing a new `Data`.

Use of `general pattern` relative paths from the arc root folder:

`assays/Assay1/isa.assay.xlsx`:

| Input [Source Name] | Parameter[Instrument model] | Output [Raw Data File] |
| Input [Source Name] | Parameter[Instrument model] | Output [Data] |
|-------------|---------------------------------|----------------------------------|
| input | Bruker 500 Avance | assays/Assay1/dataset/measurement.txt |

`assays/Assay2/isa.assay.xlsx`:

| Input [Raw Data File] | Parameter[script file] | Output [Derived Data File] |
| Input [Data] | Parameter[script file] | Output [Data] |
|----------------------------------|---------------------------------|----------------------------------|
| assays/Assay1/dataset/measurement.txt | assays/Assay2/dataset/script.sh | assays/Assay2/dataset/result.txt |

Expand Down
24 changes: 19 additions & 5 deletions ISA-XLSX.md
Original file line number Diff line number Diff line change
Expand Up @@ -620,15 +620,29 @@ Each annotation table sheet MUST contain at most one `Input` and at most one `Ou

- An `Extract Material` MUST be indicated with the node type `Material Name`.

- An `Image File` MUST be indicated with the node type `Image File`.
- A `Data` object MUST be indicated with the node type `Data`.

- A `Raw Data File` MUST be indicated with the node type `Raw Data File`.
`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.

- A `Derived Data File` MUST be indicated with the node type `Derived Data File`.
The `Data` node type MUST correspond to a relevant data resource location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns. If the annotation of the `Data` node refers not to the complete resource, but a part of it, a `Selector` MAY be added. This Selector MUST be separated from the resource location using a `#`— with no whitespace between: `location#selector`. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by [W3](https://www.w3.org/TR/annotation-model/#fragment-selector).

The format of the data resource MAY be further qualified using a `Data Format` column. The `Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml)
SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.

The format and usage info about the Selector MAY be further qualified using a `Data Selector Format` column. The `Data Selector Format` SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.


## Examples

### Data Location and Selector

In this example, there is a measurement of two `Samples`, namely `input1` and `input2`. The values measured are both written into the same data resource in the location `result.csv`, whichs formatting is tabular, according to the `Data Format` being `text/csv`. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a `#`), namely `col=1` and `col=2`. The specification about the formatting of these selectors can be found in the provided link, namely `https://datatracker.ietf.org/`.

`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.

`Image File`, `Raw Data File` or `Derived Data File` node types MUST correspond to a relevant file location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns.
| Input [Sample Name] | Output [Data] | Data Format | Data Selector Format |
|-------------|---------------------------------|----------------------------------|--|
| input1 | result.csv#col=1 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |
| input2 | result.csv#col=2 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |

## Protocol Columns

Expand Down

0 comments on commit 604a083

Please sign in to comment.