Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Data Nodes #93

Merged
merged 4 commits into from
Feb 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions ARC specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,19 +245,21 @@ All metadata references to files or directories located inside the ARC MUST foll

#### Examples

In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Raw Data file`. `Assay2` references this `Data file` for producing a new `Derived Data File`
##### General Pattern

In this example, there are two `assays`, with `Assay1`containing a measurement of a `Source` material, producing an output `Data`. `Assay2` references this `Data` for producing a new `Data`.

Use of `general pattern` relative paths from the arc root folder:

`assays/Assay1/isa.assay.xlsx`:

| Input [Source Name] | Parameter[Instrument model] | Output [Raw Data File] |
| Input [Source Name] | Parameter[Instrument model] | Output [Data] |
|-------------|---------------------------------|----------------------------------|
| input | Bruker 500 Avance | assays/Assay1/dataset/measurement.txt |

`assays/Assay2/isa.assay.xlsx`:

| Input [Raw Data File] | Parameter[script file] | Output [Derived Data File] |
| Input [Data] | Parameter[script file] | Output [Data] |
|----------------------------------|---------------------------------|----------------------------------|
| assays/Assay1/dataset/measurement.txt | assays/Assay2/dataset/script.sh | assays/Assay2/dataset/result.txt |

Expand Down
24 changes: 19 additions & 5 deletions ISA-XLSX.md
Original file line number Diff line number Diff line change
Expand Up @@ -620,15 +620,29 @@ Each annotation table sheet MUST contain at most one `Input` and at most one `Ou

- An `Extract Material` MUST be indicated with the node type `Material Name`.

- An `Image File` MUST be indicated with the node type `Image File`.
- A `Data` object MUST be indicated with the node type `Data`.

- A `Raw Data File` MUST be indicated with the node type `Raw Data File`.
`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.

- A `Derived Data File` MUST be indicated with the node type `Derived Data File`.
The `Data` node type MUST correspond to a relevant data resource location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns. If the annotation of the `Data` node refers not to the complete resource, but a part of it, a `Selector` MAY be added. This Selector MUST be separated from the resource location using a `#`— with no whitespace between: `location#selector`. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by [W3](https://www.w3.org/TR/annotation-model/#fragment-selector).

The format of the data resource MAY be further qualified using a `Data Format` column. The `Data Format` SHOULD be expressed using a [MIME format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types), most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: `type/subtype`. If appropriate, a format from the list composed by [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml)
SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.

The format and usage info about the Selector MAY be further qualified using a `Data Selector Format` column. The `Data Selector Format` SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.


## Examples

### Data Location and Selector

In this example, there is a measurement of two `Samples`, namely `input1` and `input2`. The values measured are both written into the same data resource in the location `result.csv`, whichs formatting is tabular, according to the `Data Format` being `text/csv`. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a `#`), namely `col=1` and `col=2`. The specification about the formatting of these selectors can be found in the provided link, namely `https://datatracker.ietf.org/`.

`Source Names`, `Sample Names`, `Material Names` MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.

`Image File`, `Raw Data File` or `Derived Data File` node types MUST correspond to a relevant file location, following the [Data Path Annotation](/ARC%20specification.md#data-path-annotation) patterns.
| Input [Sample Name] | Output [Data] | Data Format | Data Selector Format |
|-------------|---------------------------------|----------------------------------|--|
| input1 | result.csv#col=1 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |
| input2 | result.csv#col=2 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |

## Protocol Columns

Expand Down