Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scans.tsv beyond first session not found #542

Closed
coxroy opened this issue Apr 24, 2023 · 20 comments
Closed

scans.tsv beyond first session not found #542

coxroy opened this issue Apr 24, 2023 · 20 comments

Comments

@coxroy
Copy link

coxroy commented Apr 24, 2023

hello,

using BIDS=bids.layout to index a bids folder structure, it appears that scans.tsv files beyond the first session aren't recognized.

so BIDS.subjects(index_to subject_session_1).scans give the path to the relevant scans.tsv file, but BIDS.subjects(index_to subject_session_2).scans remains empty even though the file exists

@Remi-Gau
Copy link
Collaborator

Can you give me more details? Because I seem to not be able to reproduce the error.

Trying on this example dataset: https://github.com/bids-standard/bids-examples/tree/master/motion_spotrotation

>> BIDS = bids.layout(fullfile(pwd, 'motion_spotrotation'))
BIDS = 
  struct with fields:

              pth: '/home/remi/github/bids/matlab/tests/bids-examples/motion_spotrotation'
      description: [1×1 struct]
         sessions: {}
     participants: [1×1 struct]
         subjects: [1×10 struct]
             root: [0×0 struct]
    is_datalad_ds: 0

>> bids.query(BIDS, 'sessions', 'sub', '01')
ans =
  1×2 cell array
    {'body'}    {'joy'}

>> BIDS.subjects(1).scans
ans =
    '/home/remi/github/bids/matlab/tests/bids-examples/motion_spotrotation/sub-01/ses-body/sub-01_ses-body_scans.tsv'

>> BIDS.subjects(2).scans
ans =
    '/home/remi/github/bids/matlab/tests/bids-examples/motion_spotrotation/sub-01/ses-joy/sub-01_ses-joy_scans.tsv'

@Remi-Gau
Copy link
Collaborator

Note to myself: scans and sessions should be accessible by using bids.query, because there is no guarantee that the output structure of bids.layout will always have the same "shape" (it could literally change from one commit to the next without deprecation warning), where as the API of bids.query is guaranteed much better stability (and proper deprecation cycles).

@coxroy
Copy link
Author

coxroy commented Apr 24, 2023

my apologies, the subjects.scans field is complete after all. problem was due to me having two test versions of a dataset, one of which didn't have scans.tsv files available yet for the second session.

regarding your note to self: as it happens my next question would have been whether it's currently possible to directly query scans and sessions tsv files, but I suppose that's answered :)

would be great to have that functionality though, as well as the option to directly query and return the contents of tsv files in a structured manner (as is possible for jsons by using the metadata flag). so far I've been doing this manually by e.g. looping across all scans or sessions fields and calling bids.util.tsvread

for context: we're in the process of converting several large and complex datasets to bids. this bids-matlab tool is already really helpful for doing so, but added querying functionality would make data monitoring and selection even more convenient.

@Remi-Gau
Copy link
Collaborator

would be great to have that functionality though, as well as the option to directly query and return the contents of tsv files in a structured manner (as is possible for jsons by using the metadata flag). so far I've been doing this manually by e.g. looping across all scans or sessions fields and calling bids.util.tsvread

I can try to extend the functionality of query to cover that too.

for context: we're in the process of converting several large and complex datasets to bids. this bids-matlab tool is already really helpful for doing so, but added querying functionality would make data monitoring and selection even more convenient.

Not sure which type of data this is "converting" (is it more than "renaming") but hope you are making good use of the bids.File to creating filenames.

will try to add scan and session query

let me know if there is any low hanging fruits you may need.

@coxroy
Copy link
Author

coxroy commented Apr 24, 2023

wonderful, thanks.

one thing I can think of: similar to querying file paths of tsv files with a particular suffix:

bids.query(BIDS,'data','extension','.tsv','suffix',{'channels','events'})

the user could request the contents of particular tsv types (scans.tsv or sessions.tsv, but also channels.tsv, events.tsv, or whatever the case may be)

@Remi-Gau
Copy link
Collaborator

So something like: content instead of data

bids.query(BIDS, 'content',...
           'extension', '.tsv',...
           'suffix', {'channels','events'})

@coxroy
Copy link
Author

coxroy commented Apr 24, 2023

yes, though I suppose 'content' would only really be applicable to the tsv extension. perhaps 'tsvcontent' or 'tsvdata' to avoid trying to read the contents of other file types (and prob without providing an 'extension' flag)? think this would be analogous to how filters currently operate on 'metadata' and 'metafiles' queries (all filters work, but supplying an 'extension' key-value pair results in an empty cell, presumably bc extension can only be .json)

minor issue may be inconsistency/unclarity as to how to obtain file paths and contents of jsons (from querying 'metafiles' and 'metadata' respectively) vs. getting file paths and contents of tsvs (from querying 'data' [while specifying 'extension' as '.tsv'] and a new 'content' or 'tsvcontent' type, respectively).

as I'm trying things out, I noticed that querying tsv files doesn't actually return all tsv files:
for example, bids.query(BIDS,'data','extension','.tsv' returns '_events.tsv' and '_channels.tsv' files, but not the '_electrodes.tsv' files that we also have. similarly, bids.query(BIDS,'suffixes') skips 'electrodes' as a suffix. perhaps intentional, but given that bids allows many different suffixes before the tsv extension (even custom ones I think?), would be convenient to just index whatever tsv files are available in the dataset (and supposedly deemed necessary by the user) when initially calling bids.layout

@Remi-Gau
Copy link
Collaborator

There are still a few blind spots in terms of what gets indexed or can be queried.

See for example.

#432

But I had not realized that it was also the case for electrodes files.
Will double check and open an issue.

Same with your other suggestions: will open separate issues and PR for each Feature to keep the discussions focused.

@Remi-Gau
Copy link
Collaborator

@coxroy

Can you tell me more about the dataset where you cannot query electrodes.tsv files?

I have a test set up on one of the EEG bids-exemple and it does pick up the electrodes suffix and also checked manually that filter by extension "tsv" also grabs the electrodes.tsv

suffixes = {'channels', 'eeg', 'electrodes', 'events'};

>> BIDS = bids.layout(fullfile(pth_bids_example, 'eeg_face13'));
>> extension = bids.query(BIDS,'data','extension','.tsv')
extension =
  30×1 cell array
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-002/eeg/sub-002_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-002/eeg/sub-002_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-002/eeg/sub-002_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-003/eeg/sub-003_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-003/eeg/sub-003_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-003/eeg/sub-003_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-004/eeg/sub-004_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-004/eeg/sub-004_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-004/eeg/sub-004_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-005/eeg/sub-005_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-005/eeg/sub-005_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-005/eeg/sub-005_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-006/eeg/sub-006_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-006/eeg/sub-006_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-006/eeg/sub-006_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-007/eeg/sub-007_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-007/eeg/sub-007_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-007/eeg/sub-007_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-008/eeg/sub-008_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-008/eeg/sub-008_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-008/eeg/sub-008_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-009/eeg/sub-009_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-009/eeg/sub-009_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-009/eeg/sub-009_task-faceFO_events.tsv'  }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-010/eeg/sub-010_electrodes.tsv'          }
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-010/eeg/sub-010_task-faceFO_channels.tsv'}
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-010/eeg/sub-010_task-faceFO_events.tsv'  }

@Remi-Gau
Copy link
Collaborator

Got a couple of fixes to make to it but once this PR

#550

is merged, it should allow you to query sessions and scans tsv files like any other files.

Also working on something that would allow to get directly the content of a bunch of tsv files through a query.

@coxroy
Copy link
Author

coxroy commented May 1, 2023

thanks for your efforts. using the example dataset, all the *_electrodes.tsv files are indeed returned.

however, for our own datasets they are not. my guess would be this has to do with additional entities in our file names, e.g., sub-ercp1001_ses-w0_task-resteyesclosed_run-01_electrodes.tsv. as far as I can tell the BIDS format allows for this, and we use it to distinguish between (slightly) different electrode (and channel) layouts between runs/tasks etc.

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented May 1, 2023

Thanks for checking.

I think that these are not valid BIDS filenames.

At least not from what I can see in the EEG or iEEG pages:

Does your dataset pass the BIDS validator?

In "normal" mode bids-matlab will rely on the bids-schema to decide what files should be indexed. So if the files are not BIDS valid, they will be quietly skipped.

You can switch to verbose mode so that bids-matlab will tell you which subjects are being indexed and which files are being skipped.

See below on the same example I used but where I renamed the electrodes.tsv to be more like the one you used.

>> BIDS = bids.layout(pwd, 'verbose', true);


Indexing dataset:
	/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13
 Indexing subject: sub-001 [.Warning:
Skipping file: sub-001_task-faceFO_electrodes.tsv.
 Unknown entities: 'task'
Try using the parameter: " 'use_schema', false "

 
> In bids.internal.error_handling (line 57)
  In bids.internal.append_to_layout (line 112)
  In bids.layout>parse_using_schema (line 409)
  In bids.layout>parse_subject (line 350)
  In bids.layout (line 183) 
]
 Indexing subject: sub-002 [.]
 Indexing subject: sub-003 [.]
 Indexing subject: sub-004 [.]
 Indexing subject: sub-005 [.]
 Indexing subject: sub-006 [.]
 Indexing subject: sub-007 [.]
 Indexing subject: sub-008 [.]
 Indexing subject: sub-009 [.]
 Indexing subject: sub-010 [.]

As the warning implies you can ask bids-matlab to not rely on the schema for the indexing. Then files will be indexed as long as they have a generic bids filename pattern with no restriction on what entity is present for each file.

This should let you be able to query this file: see below.

>> BIDS = bids.layout(pwd, 'verbose', true, 'use_schema', false);

Indexing dataset:
	/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13
 Indexing subject: sub-001 [.]
 Indexing subject: sub-002 [.]
 Indexing subject: sub-003 [.]
 Indexing subject: sub-004 [.]
 Indexing subject: sub-005 [.]
 Indexing subject: sub-006 [.]
 Indexing subject: sub-007 [.]
 Indexing subject: sub-008 [.]
 Indexing subject: sub-009 [.]
 Indexing subject: sub-010 [.]

>> bids.query(BIDS, 'data', 'sub', '001', 'suffix', 'electrodes')
ans =
  1×1 cell array
    {'/home/remi/github/bids/matlab/tests/bids-examples/eeg_face13/sub-001/eeg/sub-001_task-faceFO_electrodes.tsv'}

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented May 1, 2023

In theory this example should give you hints on to do this but I agree that this is not enough.

% To work with derivatives data, we must ignore the BIDS schema for indexing.

Similarly if you can think of how the help section of bids.layout can be made more "clear", suggestions are more than welcome: https://bids-matlab.readthedocs.io/en/latest/layout_query.html#+bids.layout

I am so used to see / use this code I sometimes need an external pair of eyes to tell 'this is not clear' so feel free to come up with improvement suggestions.

@coxroy
Copy link
Author

coxroy commented May 1, 2023

checked: our *_electrodes.tsv files do pass the online validator, but you're right that the electrodes description page doesn't allow/mention task or run information. (the electrodes files ARE indexed by the validator, bc there were some warnings about their contents that still need fixing)

also checked the channels page (https://bids-specification.readthedocs.io/en/latest/modality-specific-files/electroencephalography.html#channels-description-_channelstsv): more flexible but didn't realize it also doesn't really accommodate all entities.

wonder if that's intentional or an oversight, bc it's a reality that available channels and/or electrodes can vary for each eeg file within a session. ideal for both electrodes and channels files would be the structure used for events: https://bids-specification.readthedocs.io/en/latest/modality-specific-files/task-events.html, where you essentially copy the entire filename with whatever entities are present.

all that being said, setting use_schema to false fixes it all. thx for pointing out, didn't realize that's what this flag does. [will test the tsv_content functionality very shortly]

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented May 1, 2023

you are correct that the validator is fine with electrodes.tsv having more entities than the spec says: it does the same on my dummy example.

There should be consistency between the validator and the spec.

Let me check if there is not an issue opened about this in another repo.

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented May 1, 2023

OK this is a known inconsistency of the validator that is too permissive compare to what the specification of BIDS says: https://github.com/bids-standard/bids-validator/issues/1198

if you want to see some of the reasoning behind the limited set of entities allowed in the electrodes.tsv filename I have found some posts about it:

I have not seen anything explicit about this rule in the specification: I will open an issue about this.

@coxroy
Copy link
Author

coxroy commented May 5, 2023

altered our *electrodes.tsv filenames to be in line with the specification. so of form sub-xxxx_ses-xxxx_electrodes.tsv, with there only being one such file per subject-session-modality folder.

however, doing this means that bids.layout no longer indexes these electrodes files as dependencies of the EEG data files they belong to. that is, for a given EEG data file (e.g. BIDS.subjects(sub_idx).eeg(file_idx) ), the dependencies.group field does not contain the electrodes file (whereas previously - with its lengthier incorrect filename - it did). guessing this is due to not all entities in data filename (eg., run and task) being present in electrode filename? perhaps this is as intended, but would consider keeping the electrodes file listed as a dependency.

(or perhaps more generally: shouldn't all files whose entities match the entities of a primary file be listed as a dependency, even when the primary file has additional entities?)

@Remi-Gau
Copy link
Collaborator

closing this issue and moving the last comment to a new issue

@Remi-Gau
Copy link
Collaborator

@all-contributors please add @coxroy for bug, ideas, userTesting

@allcontributors
Copy link
Contributor

@Remi-Gau

I've put up a pull request to add @coxroy! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants