additional payload #44

johanneswerner · 2021-07-28T15:18:14Z

johanneswerner
Jul 28, 2021

suggest to deal differently with additional payload files. I believe that reproducibility might be difficult to ensure if additional payload can be located anywhere within the ARC structure.

It would at least be good to have one subdirectory where all additional payload is located. Researchers might be reluctant to perform a gradual migration or any migration at all if they can leave their data as they want and it does not make much of a difference (until data publication - which is quite much the state that we have right now). I see the possibility that researchers might dump data somewhere in the ARC folder structure and do not take care of it (when they still remember what its usage/necessity was). Of course, I see the need to put additional payload close to already correctly deposited data, on the other hand, making a clear separation might motivate researchers to make a quicker transition.

chgarth · 2021-07-29T10:12:23Z

chgarth
Jul 29, 2021
Maintainer

It must be possible to place AP files anywhere in an ARC, so that an ARC can simultaneously be an RO Crate for example – this requires ro-crate-metadata.json in the top-level directory. Furthermore, users should be free to add files such as README.txt or other annotations in workflows or runs (or anywhere, really). Fundamentally, everything in an ARC is AP unless it is linked from top-level metadata (ie. from arc.cwl / isa.investigation.xlsx).

I think there is a distinction here between "making ARCs correct" and "making ARCs beautiful / clean". The latter could also be tackled with ARC quality control mechanisms, once they materialize. ("Why am I getting minus points for my ARC? Oh I have these leftover files...")

0 replies

Brilator · 2021-07-30T06:41:55Z

Brilator
Jul 30, 2021

I think this is a critical point for user-friendliness. Most people will likely be glad to find the ARC not coming with too many hard requirements and adaptable to their needs.

All data "dumped" into an ARC is through git at least traceable and version-controlled (compared to the status quo of data flying around in emails, unstructured servers and clouds). And with the bit of nudging (please put raw data to assays, metadata to isa.*.xlsx, etc.) which also aligns with most biologists' intuition, we at least capture the most important primary data.

This also links with my comment on project management #15 . Not properly structuring this kind of extra data (meeting minutes, sketches, etc.) won't eliminate reproducibility of primary data.
Let them grow their ARC easily. This goes a much longer way than just dumping data on some file server.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

additional payload #44

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

additional payload #44

johanneswerner Jul 28, 2021

Replies: 2 comments

chgarth Jul 29, 2021 Maintainer

Brilator Jul 30, 2021

johanneswerner
Jul 28, 2021

chgarth
Jul 29, 2021
Maintainer

Brilator
Jul 30, 2021