Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NwbTimeSeriesExtractor to load non-electrical series data from NWB #3587

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

h-mayorquin
Copy link
Collaborator

@h-mayorquin h-mayorquin commented Dec 17, 2024

Some users would like to access data from NWB that is not an ElectricalSeries to analyze it with SpikeInterface. TimeSeries data can be loaded lazily to Spikeinterface but it does not have all the ElectricalSeries infrastructure like electrodes so it can not be loaded by the current extractor. This PR adds another extractor that can be used to load TimeSeries data.

@steevelaquitaine @borrepp

@h-mayorquin h-mayorquin self-assigned this Dec 17, 2024
@h-mayorquin h-mayorquin changed the title Add NwbTimeSeriesExtractor to extract TimeSeries data from NWB Add NwbTimeSeriesExtractor to load non-electrical series data from NWB Dec 17, 2024
@alejoe91 alejoe91 added the extractors Related to extractors module label Jan 7, 2025
Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple cosmetic things.

src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
and returns a list with their paths.
"""
if backend == "hdf5":
import h5py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how big h5py is or zarr for that matter. Is there any benefit to just import the exact thing we need (ie import h5py.Group rather than the full package?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most packages do not keep the type of discipline that you would need for this to make a difference. I venture to guess that it would not make a difference in this case but I don't have time to profile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly just didn't know the size of these packages. Something like scipy would be so slow and heavy for this kind of check, but if these are rather small/fast (which has been my experience playing with h5py at least) then it doesn't matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what I mean is that usually both of them are equivalent because importing one attribute imports the full module in most python packages. Anyway, here are the measurements:

@h-laptop$ conda activate work
(work) @h-laptop$ python -m timeit -s "import h5py" 
50000000 loops, best of 5: 5.43 nsec per loop
(work) @h-laptop$ python -m timeit -s "from h5py import Group" 
50000000 loops, best of 5: 6.33 nsec per loop
(work) @h-laptop$ python -m timeit -s "from zarr import Group" 
50000000 loops, best of 5: 5.39 nsec per loop
(work) @h-laptop$ python -m timeit -s "import zarr" 
50000000 loops, best of 5: 5.31 nsec per loop

If True, the time vector is loaded into the recording object. Useful when
precise timing information is needed.
samples_for_rate_estimation : int, default: 1000
The number of timestamp samples used for estimating the sampling rate when
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the writing. This would just be the number of timestamps right? I'm not quite sure what it would mean to say timestamp samples? But I don't know this format so I could be totally off on this :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, confusing, the thing is that the object/data container is called timestamps. That time series (timestamps) has samples.

Let me think this through.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I welcome any suggestion though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would need to see an example. Maybe I'll look at the test and see if I can better understand how this works. If it is confusing naming then we just have to live with it.

src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
self.timeseries_path = list(time_series_dict.keys())[0]
else:
raise ValueError(
f"Multiple TimeSeries found! Specify 'timeseries_path'. Options: {list(time_series_dict.keys())}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extractors Related to extractors module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants