Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spikeglx: parse tcat input #1511

Open
tabedzki opened this issue Jul 24, 2024 · 13 comments
Open

Spikeglx: parse tcat input #1511

tabedzki opened this issue Jul 24, 2024 · 13 comments
Assignees
Labels
Milestone

Comments

@tabedzki
Copy link

tabedzki commented Jul 24, 2024

Describe the bug
I am unable to read in my lf files when using SpikeGLXRawIO. The file structure is currently as below. When I had all the lf files be t0 instead of tcat it worked flawlessly.

ya008_20240525_g0
├── ya008_20240525_g0_ct_offsets.txt
├── ya008_20240525_g0_fyi.txt
├── ya008_20240525_g0_imec0
│  ├── ya008_20240525_g0_t0.imec0.ap.bin
│  ├── ya008_20240525_g0_t0.imec0.ap.meta
│  ├── ya008_20240525_g0_tcat.imec0.lf.bin
│  └── ya008_20240525_g0_tcat.imec0.lf.meta
├── ya008_20240525_g0_imec1
│  ├── ya008_20240525_g0_t0.imec1.ap.bin
│  ├── ya008_20240525_g0_t0.imec1.ap.meta
│  ├── ya008_20240525_g0_tcat.imec1.lf.bin
│  └── ya008_20240525_g0_tcat.imec1.lf.meta
├── ya008_20240525_g0_t0.nidq.bin
└── ya008_20240525_g0_t0.nidq.meta

To Reproduce

(neuroconv_312) ➜  test2 ipython
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import neo.rawio

In [2]: reader = neo.rawio.SpikeGLXRawIO(dirname='./ya008_20240525_g0')

In [3]: reader.parse_header()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[3], line 1
----> 1 reader.parse_header()

File /scratch/ct5868/.conda/envs/neuroconv_312/lib/python3.12/site-packages/neo/rawio/baserawio.py:189, in BaseRawIO.parse_header(self)
    176 """
    177 Parses the header of the file(s) to allow for faster computations
    178 for all other functions
    179
    180 """
    181 # this must create
    182 # self.header['nb_block']
    183 # self.header['nb_segment']
   (...)
    186 # self.header['spike_channels']
    187 # self.header['event_channels']
--> 189 self._parse_header()
    190 self._check_stream_signal_channel_characteristics()
    191 self.is_header_parsed = True

File /scratch/ct5868/.conda/envs/neuroconv_312/lib/python3.12/site-packages/neo/rawio/spikeglxrawio.py:147, in SpikeGLXRawIO._parse_header(self)
    144 signal_channels = []
    145 for stream_name in stream_names:
    146     # take first segment
--> 147     info = self.signals_info_dict[0, stream_name]
    149     stream_id = stream_name
    150     stream_index = stream_names.index(info["stream_name"])

KeyError: (0, 'imec1.ap')

Environment:

  • OS: Linux
  • Python version: 3.12
  • Neo version: 0.13.1

Related to #1501 #1390

I am happy to provide access to the meta files. The SpikeGLX data is rather large so if having access to that would be necessary, I'd have to ask IT how to best distribute that data.

@tabedzki tabedzki added the bug label Jul 24, 2024
@apdavison apdavison added this to the 0.14.0 milestone Jul 26, 2024
@samuelgarcia
Copy link
Contributor

@alejoe91 can you handle this ?

@h-mayorquin
Copy link
Contributor

I think this should be fixed by this:

#1608

It has been a while but any chance you could try this @tabedzki ?

@alejoe91
Copy link
Contributor

alejoe91 commented Jan 8, 2025

@tabedzki just following up on this. The solution implemented in #1608 by @h-mayorquin should work here because files are parsed based on the original file name in the meta file. Can you test it on your end?

@tabedzki
Copy link
Author

Thanks for your patience. Unfortunately, this does not address the underlying issue.
I got the same error across other folders so it is not confined to this specific directory.

>>> import neo.rawio
>>> reader = neo.rawio.SpikeGLXRawIO(dirname='./jk44_08092024_g0')
>>> reader.parse_header()
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    reader.parse_header()
    ~~~~~~~~~~~~~~~~~~~^^
  File "/Users/ct5868/.venv/lib/python3.13/site-packages/neo/rawio/baserawio.py", line 211, in parse_header
    self._parse_header()
    ~~~~~~~~~~~~~~~~~~^^
  File "/Users/ct5868/.venv/lib/python3.13/site-packages/neo/rawio/spikeglxrawio.py", line 157, in _parse_header
    info = self.signals_info_dict[0, stream_name]
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: (0, 'imec0.ap')
>>> neo.__version__
'0.14.0'
tree jk44_08092024_g0
jk44_08092024_g0
├── jk44_08092024_g0_ct_offsets.txt
├── jk44_08092024_g0_fyi.txt
├── jk44_08092024_g0_imec0
│   ├── jk44_08092024_g0_t0.imec0.ap.bin
│   ├── jk44_08092024_g0_t0.imec0.ap.meta
│   ├── jk44_08092024_g0_tcat.imec0.lf.bin
│   └── jk44_08092024_g0_tcat.imec0.lf.meta
├── jk44_08092024_g0_t0.nidq.bin
└── jk44_08092024_g0_t0.nidq.meta

2 directories, 8 files

@h-mayorquin
Copy link
Contributor

Can you print what is inside signals_info_dict?

We definitely need a more informative error there.

@alejoe91 alejoe91 reopened this Jan 23, 2025
@tabedzki
Copy link
Author

Thanks guys for looking into this more. Please find attached the dict requested
signals_info_dict.txt

@h-mayorquin
Copy link
Contributor

h-mayorquin commented Jan 23, 2025

OK, I understand the bug. Can you share the data with us so I can create a stub for this and add it to the tests?

The problem is that you combine a tcat signal with one that is not. The tcat is always the segment 0 but the other signal (ap) is assigned segment one. The the line that is trigger the error is assuming that there is always a segment 0 for every stream.

The scan_files function should be modified somehow to take this into account (order segments per stream).

@samuelgarcia @alejoe91 how common is to have tcat lf but not ap? It is not clear to me how we should handle this.

@tabedzki
Copy link
Author

I'd have to check with the researcher if I can share his data, though the data itself is large (95GB). What would be the recommended way to do that if I am granted permission?

@h-mayorquin
Copy link
Contributor

Well, if you don't have acesss to an s3 bucket, dropbox, google drive, mega, etc with that capability then I have a script for stubing spikeglx data here:

https://gist.github.com/h-mayorquin/b5abe6ae8e8484976c2d27e0bb04e2a2

We will have to fine-tune for your case, we can do it in a 15 minutes short meeting.

It is important for us to know why is tcat only applied to the lp band and not to ap. Moreover, why is the gating mechanism (g0) preserved on the naming even after tcat. In short, a summary description of the dataset for provenance on our test data.

@mik-schutte
Copy link

I seem to be facing a similar issue

Folder structure:

SNA-137704_mstim_20112024_S1_1_g0
├── catgt_SNA-137704_mstim_20112024_S1_1_g0
│   ├── SNA-137704_mstim_20112024_S1_1_g0_ct_offsets.txt
│   ├── SNA-137704_mstim_20112024_S1_1_g0_fyi.txt
│   ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.bin
│   ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.meta
│   ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.xd_384_6_500.txt
│   ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.meta
│   ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.xd_1_0_500.txt
│   └── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.xd_1_1_200.txt
├── SNA-137704_mstim_20112024_S1_1_g0_imec0
│   ├── SNA-137704_mstim_20112024_S1_1_g0_t0.imec0.ap.bin
│   └── SNA-137704_mstim_20112024_S1_1_g0_t0.imec0.ap.meta
├── SNA-137704_mstim_20112024_S1_1_g0_t0.nidq.bin
└── SNA-137704_mstim_20112024_S1_1_g0_t0.nidq.meta

Code to reproduce

import spikeinterface.extractors as se
path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0'
data = se.read_spikeglx(folder_path=path, stream_id='nidq')

# Also fails when running 
# se.read_spikeglx(folder_path=path, stream_id='imec0')
# se.read_spikeglx(folder_path=path)

No matter which stream_id I define, the error remains the same:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[3], [line 6](vscode-notebook-cell:?execution_count=3&line=6)
      [4](vscode-notebook-cell:?execution_count=3&line=4) # Doesn't work
      [5](vscode-notebook-cell:?execution_count=3&line=5) path = Path('D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0')
----> [6](vscode-notebook-cell:?execution_count=3&line=6) data = se.read_spikeglx(folder_path=path, stream_id='imec0')
      [8](vscode-notebook-cell:?execution_count=3&line=8) # Works in pre processing
      [9](vscode-notebook-cell:?execution_count=3&line=9) # path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0/catgt_SNA-137704_mstim_20112024_S1_1_g0'
     [10](vscode-notebook-cell:?execution_count=3&line=10) data = se.read_spikeglx(path)

File c:\Users\miksc\anaconda3\envs\NP\Lib\site-packages\spikeinterface\extractors\neoextractors\spikeglx.py:65, in SpikeGLXRecordingExtractor.__init__(self, folder_path, load_sync_channel, stream_id, stream_name, all_annotations, use_names_as_ids)
     [55](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:55) def __init__(
     [56](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:56)     self,
     [57](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:57)     folder_path,
   (...)
     [62](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:62)     use_names_as_ids: bool = False,
     [63](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:63) ):
     [64](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:64)     neo_kwargs = self.map_to_neo_kwargs(folder_path, load_sync_channel=load_sync_channel)
---> [65](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:65)     NeoBaseRecordingExtractor.__init__(
     [66](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:66)         self,
     [67](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:67)         stream_id=stream_id,
     [68](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:68)         stream_name=stream_name,
     [69](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:69)         all_annotations=all_annotations,
     [70](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:70)         use_names_as_ids=use_names_as_ids,
     [71](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:71)         **neo_kwargs,
...
--> [157](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:157)     info = self.signals_info_dict[0, stream_name]
    [159](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:159)     buffer_id = stream_name
    [160](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:160)     buffer_name = stream_name

KeyError: (0, 'nidq')

Interestingly, I am able to load the catGT filtered data when I run:

path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0/catgt_SNA-137704_mstim_20112024_S1_1_g0'
data = se.read_spikeglx(path)

Hopefully this is the right location for this issue. As I see g0 mentioned in the comment above I also removed the catgt folder's affix of '_g0' but this did not alter anything.

I am also willing to share any necessary additional information or data.

@tabedzki
Copy link
Author

Hi h-mayorquin,
Thanks for being willing to help and for providing a stubbing mechanism. Here is the stubbed data

@h-mayorquin
Copy link
Contributor

h-mayorquin commented Feb 5, 2025

Hi @tabedzki

The data that you shared with me does not have the binaries for the tcat processed lfp or nidq.
The script that I shared with you is a draft. I was expecting that we could take a look together andc finish it but now that I look at it again I think that you only need to modify the file_pattern: str = "*.ap.bin" argument in the create_data_stub function.

Note also, that you might need to modify the number of channels manually accordingo to your data.

Reach to me in slack if you need help with it.

@h-mayorquin
Copy link
Contributor

h-mayorquin commented Feb 5, 2025

@tabedzki, @samuelgarcia and I are wondering:

How come you only have the LF concatenated? We believe that typically, tcat is used for concatenation, so it’s confusing for us to see both concatenated and non-concatenated data in the same folder. Could tcat be used for something else? Do you happen to know?

@mik-schutte

Hi, your case is different since tcat is only present in one folder. Your case should work if you point the dirname of spikeglx rawio to the folder that either has tcat or does not. You can set dirname to either catgt_SNA-137704_mstim_20112024_S1_1_g0 or 137704_mstim_20112024_S1_1_g0_imec0, and that should work.

That said, we’d still like to understand why you have both tcat and non-tcat data at the same time. Is tcat used for something other than concatenation?

We need this context to make a good solution for this. We already have some ideas but we need your input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants