additional payload #44
Replies: 2 comments
-
It must be possible to place AP files anywhere in an ARC, so that an ARC can simultaneously be an RO Crate for example – this requires ro-crate-metadata.json in the top-level directory. Furthermore, users should be free to add files such as README.txt or other annotations in workflows or runs (or anywhere, really). Fundamentally, everything in an ARC is AP unless it is linked from top-level metadata (ie. from arc.cwl / isa.investigation.xlsx). I think there is a distinction here between "making ARCs correct" and "making ARCs beautiful / clean". The latter could also be tackled with ARC quality control mechanisms, once they materialize. ("Why am I getting minus points for my ARC? Oh I have these leftover files...") |
Beta Was this translation helpful? Give feedback.
-
I think this is a critical point for user-friendliness. Most people will likely be glad to find the ARC not coming with too many hard requirements and adaptable to their needs. All data "dumped" into an ARC is through git at least traceable and version-controlled (compared to the status quo of data flying around in emails, unstructured servers and clouds). And with the bit of nudging (please put raw data to assays, metadata to isa.*.xlsx, etc.) which also aligns with most biologists' intuition, we at least capture the most important primary data. This also links with my comment on project management #15 . Not properly structuring this kind of extra data (meeting minutes, sketches, etc.) won't eliminate reproducibility of primary data. |
Beta Was this translation helpful? Give feedback.
-
suggest to deal differently with additional payload files. I believe that reproducibility might be difficult to ensure if additional payload can be located anywhere within the ARC structure.
It would at least be good to have one subdirectory where all additional payload is located. Researchers might be reluctant to perform a gradual migration or any migration at all if they can leave their data as they want and it does not make much of a difference (until data publication - which is quite much the state that we have right now). I see the possibility that researchers might dump data somewhere in the ARC folder structure and do not take care of it (when they still remember what its usage/necessity was). Of course, I see the need to put additional payload close to already correctly deposited data, on the other hand, making a clear separation might motivate researchers to make a quicker transition.
Beta Was this translation helpful? Give feedback.
All reactions