-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scans.tsv beyond first session not found #542
Comments
Can you give me more details? Because I seem to not be able to reproduce the error. Trying on this example dataset: https://github.com/bids-standard/bids-examples/tree/master/motion_spotrotation >> BIDS = bids.layout(fullfile(pwd, 'motion_spotrotation'))
BIDS =
struct with fields:
pth: '/home/remi/github/bids/matlab/tests/bids-examples/motion_spotrotation'
description: [1×1 struct]
sessions: {}
participants: [1×1 struct]
subjects: [1×10 struct]
root: [0×0 struct]
is_datalad_ds: 0
>> bids.query(BIDS, 'sessions', 'sub', '01')
ans =
1×2 cell array
{'body'} {'joy'}
>> BIDS.subjects(1).scans
ans =
'/home/remi/github/bids/matlab/tests/bids-examples/motion_spotrotation/sub-01/ses-body/sub-01_ses-body_scans.tsv'
>> BIDS.subjects(2).scans
ans =
'/home/remi/github/bids/matlab/tests/bids-examples/motion_spotrotation/sub-01/ses-joy/sub-01_ses-joy_scans.tsv' |
Note to myself: scans and sessions should be accessible by using |
my apologies, the subjects.scans field is complete after all. problem was due to me having two test versions of a dataset, one of which didn't have scans.tsv files available yet for the second session. regarding your note to self: as it happens my next question would have been whether it's currently possible to directly query scans and sessions tsv files, but I suppose that's answered :) would be great to have that functionality though, as well as the option to directly query and return the contents of tsv files in a structured manner (as is possible for jsons by using the metadata flag). so far I've been doing this manually by e.g. looping across all scans or sessions fields and calling bids.util.tsvread for context: we're in the process of converting several large and complex datasets to bids. this bids-matlab tool is already really helpful for doing so, but added querying functionality would make data monitoring and selection even more convenient. |
I can try to extend the functionality of query to cover that too.
Not sure which type of data this is "converting" (is it more than "renaming") but hope you are making good use of the bids.File to creating filenames. will try to add scan and session query let me know if there is any low hanging fruits you may need. |
wonderful, thanks. one thing I can think of: similar to querying file paths of tsv files with a particular suffix:
the user could request the contents of particular tsv types (scans.tsv or sessions.tsv, but also channels.tsv, events.tsv, or whatever the case may be) |
So something like: bids.query(BIDS, 'content',...
'extension', '.tsv',...
'suffix', {'channels','events'}) |
yes, though I suppose 'content' would only really be applicable to the tsv extension. perhaps 'tsvcontent' or 'tsvdata' to avoid trying to read the contents of other file types (and prob without providing an 'extension' flag)? think this would be analogous to how filters currently operate on 'metadata' and 'metafiles' queries (all filters work, but supplying an 'extension' key-value pair results in an empty cell, presumably bc extension can only be .json) minor issue may be inconsistency/unclarity as to how to obtain file paths and contents of jsons (from querying 'metafiles' and 'metadata' respectively) vs. getting file paths and contents of tsvs (from querying 'data' [while specifying 'extension' as '.tsv'] and a new 'content' or 'tsvcontent' type, respectively). as I'm trying things out, I noticed that querying tsv files doesn't actually return all tsv files: |
There are still a few blind spots in terms of what gets indexed or can be queried. See for example. But I had not realized that it was also the case for electrodes files. Same with your other suggestions: will open separate issues and PR for each Feature to keep the discussions focused. |
Can you tell me more about the dataset where you cannot query electrodes.tsv files? I have a test set up on one of the EEG bids-exemple and it does pick up the electrodes suffix and also checked manually that filter by extension "tsv" also grabs the electrodes.tsv
>> BIDS = bids.layout(fullfile(pth_bids_example, 'eeg_face13'));
>> extension = bids.query(BIDS,'data','extension','.tsv')
extension =
30×1 cell array
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-002/eeg/sub-002_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-002/eeg/sub-002_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-002/eeg/sub-002_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-003/eeg/sub-003_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-003/eeg/sub-003_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-003/eeg/sub-003_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-004/eeg/sub-004_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-004/eeg/sub-004_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-004/eeg/sub-004_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-005/eeg/sub-005_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-005/eeg/sub-005_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-005/eeg/sub-005_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-006/eeg/sub-006_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-006/eeg/sub-006_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-006/eeg/sub-006_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-007/eeg/sub-007_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-007/eeg/sub-007_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-007/eeg/sub-007_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-008/eeg/sub-008_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-008/eeg/sub-008_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-008/eeg/sub-008_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-009/eeg/sub-009_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-009/eeg/sub-009_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-009/eeg/sub-009_task-faceFO_events.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-010/eeg/sub-010_electrodes.tsv' }
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-010/eeg/sub-010_task-faceFO_channels.tsv'}
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-010/eeg/sub-010_task-faceFO_events.tsv' } |
Got a couple of fixes to make to it but once this PR is merged, it should allow you to query sessions and scans tsv files like any other files. Also working on something that would allow to get directly the content of a bunch of tsv files through a query. |
thanks for your efforts. using the example dataset, all the *_electrodes.tsv files are indeed returned. however, for our own datasets they are not. my guess would be this has to do with additional entities in our file names, e.g., |
Thanks for checking. I think that these are not valid BIDS filenames. At least not from what I can see in the EEG or iEEG pages:
Does your dataset pass the BIDS validator? In "normal" mode bids-matlab will rely on the bids-schema to decide what files should be indexed. So if the files are not BIDS valid, they will be quietly skipped. You can switch to verbose mode so that bids-matlab will tell you which subjects are being indexed and which files are being skipped. See below on the same example I used but where I renamed the electrodes.tsv to be more like the one you used. >> BIDS = bids.layout(pwd, 'verbose', true);
Indexing dataset:
/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13
Indexing subject: sub-001 [.Warning:
Skipping file: sub-001_task-faceFO_electrodes.tsv.
Unknown entities: 'task'
Try using the parameter: " 'use_schema', false "
> In bids.internal.error_handling (line 57)
In bids.internal.append_to_layout (line 112)
In bids.layout>parse_using_schema (line 409)
In bids.layout>parse_subject (line 350)
In bids.layout (line 183)
]
Indexing subject: sub-002 [.]
Indexing subject: sub-003 [.]
Indexing subject: sub-004 [.]
Indexing subject: sub-005 [.]
Indexing subject: sub-006 [.]
Indexing subject: sub-007 [.]
Indexing subject: sub-008 [.]
Indexing subject: sub-009 [.]
Indexing subject: sub-010 [.] As the warning implies you can ask bids-matlab to not rely on the schema for the indexing. Then files will be indexed as long as they have a generic bids filename pattern with no restriction on what entity is present for each file. This should let you be able to query this file: see below. >> BIDS = bids.layout(pwd, 'verbose', true, 'use_schema', false);
Indexing dataset:
/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13
Indexing subject: sub-001 [.]
Indexing subject: sub-002 [.]
Indexing subject: sub-003 [.]
Indexing subject: sub-004 [.]
Indexing subject: sub-005 [.]
Indexing subject: sub-006 [.]
Indexing subject: sub-007 [.]
Indexing subject: sub-008 [.]
Indexing subject: sub-009 [.]
Indexing subject: sub-010 [.]
>> bids.query(BIDS, 'data', 'sub', '001', 'suffix', 'electrodes')
ans =
1×1 cell array
{'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_task-faceFO_electrodes.tsv'} |
In theory this example should give you hints on to do this but I agree that this is not enough.
Similarly if you can think of how the help section of bids.layout can be made more "clear", suggestions are more than welcome: https://bids-matlab.readthedocs.io/en/latest/layout_query.html#+bids.layout I am so used to see / use this code I sometimes need an external pair of eyes to tell 'this is not clear' so feel free to come up with improvement suggestions. |
checked: our *_electrodes.tsv files do pass the online validator, but you're right that the electrodes description page doesn't allow/mention task or run information. (the electrodes files ARE indexed by the validator, bc there were some warnings about their contents that still need fixing) also checked the channels page (https://bids-specification.readthedocs.io/en/latest/modality-specific-files/electroencephalography.html#channels-description-_channelstsv): more flexible but didn't realize it also doesn't really accommodate all entities. wonder if that's intentional or an oversight, bc it's a reality that available channels and/or electrodes can vary for each eeg file within a session. ideal for both electrodes and channels files would be the structure used for events: https://bids-specification.readthedocs.io/en/latest/modality-specific-files/task-events.html, where you essentially copy the entire filename with whatever entities are present. all that being said, setting |
you are correct that the validator is fine with electrodes.tsv having more entities than the spec says: it does the same on my dummy example. There should be consistency between the validator and the spec. Let me check if there is not an issue opened about this in another repo. |
OK this is a known inconsistency of the validator that is too permissive compare to what the specification of BIDS says: https://github.com/bids-standard/bids-validator/issues/1198 if you want to see some of the reasoning behind the limited set of entities allowed in the electrodes.tsv filename I have found some posts about it:
I have not seen anything explicit about this rule in the specification: I will open an issue about this. |
altered our *electrodes.tsv filenames to be in line with the specification. so of form however, doing this means that (or perhaps more generally: shouldn't all files whose entities match the entities of a primary file be listed as a dependency, even when the primary file has additional entities?) |
closing this issue and moving the last comment to a new issue |
@all-contributors please add @coxroy for bug, ideas, userTesting |
I've put up a pull request to add @coxroy! 🎉 |
hello,
using BIDS=bids.layout to index a bids folder structure, it appears that scans.tsv files beyond the first session aren't recognized.
so
BIDS.subjects(index_to subject_session_1).scans
give the path to the relevant scans.tsv file, butBIDS.subjects(index_to subject_session_2).scans
remains empty even though the file existsThe text was updated successfully, but these errors were encountered: