Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation of preprocessing and sorting split by channel group. #2316

Merged
1 change: 1 addition & 0 deletions doc/how_to/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ How to guides
analyse_neuropixels
handle_drift
load_matlab_data
process_by_channel_group
188 changes: 188 additions & 0 deletions doc/how_to/process_by_channel_group.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
Processing a Recording by Channel Group
=======================================

In this tutorial, we will walk through how to preprocess and sort a recording
separately for *channel groups*. A channel group is a subset of channels grouped by some
feature - for example a multi-shank Neuropixels recording in which the channels
are grouped by shank.

**Why preprocess by channel group?**
JoeZiminski marked this conversation as resolved.
Show resolved Hide resolved

Certain preprocessing steps depend on the spatial arrangement of the channels.
For example, common average referencing (CAR) averages over channels (separately for each time point)
and subtracts the average. In such a scenario it may make sense to group channels so that
this averaging is performed only over spatially close channel groups.

**Why sort by channel group?**

When sorting, we may want to completely separately channel groups so we can
consider their signals in isolation. If recording from a long
silicon probe, we might want to sort different brain areas separately,
for example using a different sorter for the hippocampus, the thalamus, or the cerebellum.


Splitting a Recording by Channel Group
--------------------------------------

In this example, we create a 384-channel recording with 4 shanks. However this could
be any recording in which the channel are grouped in some way, for example
a multi-tetrode recording with channel groups representing the channels on each individual tetrodes.

First, let's import the parts of SpikeInterface we need into Python, and generate our toy recording:

.. code-block:: python

import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre
from spikeinterface import aggregate_channels
from probeinterface import generate_tetrode, ProbeGroup
import numpy as np

# Create a toy 384 channel recording with 4 shanks (each shank contain 96 channels)
recording, _ = se.toy_example(duration=[1.00], num_segments=1, num_channels=384)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question:
When do we use toy_example and we do we use generate_recording or generate_ground_truth_recording. I personally would prefer to only use the latter but maybe it does not matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on this and will defer to @alejoe91 @samuelgarcia

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for docs toy_example is ok! It's actually easier to intepret

four_shank_groupings = np.repeat([0, 1, 2, 3], 96)
recording.set_property("group", four_shank_groupings)

print(recording.get_channel_groups())
"""
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
"""

We can split a recording into multiple recordings, one for each channel group, with the :py:func:`~split_by` method.

.. code-block:: python

split_recording_dict = recording.split_by("group")

Splitting a recording by channel group returns a dictionary containing separate recordings, one for each channel group:

.. code-block:: python

print(split_recording_dict)
"""
{0: ChannelSliceRecording: 96 channels - 30.0kHz - 1 segments - 30,000 samples - 1.00s - float32 dtype
10.99 MiB, 1: ChannelSliceRecording: 96 channels - 30.0kHz - 1 segments - 30,000 samples - 1.00s - float32 dtype
10.99 MiB, 2: ChannelSliceRecording: 96 channels - 30.0kHz - 1 segments - 30,000 samples - 1.00s - float32 dtype
10.99 MiB, 3: ChannelSliceRecording: 96 channels - 30.0kHz - 1 segments - 30,000 samples - 1.00s - float32 dtype
10.99 MiB}
"""
Preprocessing a Recording by Channel Group
------------------------------------------

The essence of preprocessing by channel group is to first split the recording
into separate recordings, perform the preprocessing steps, then aggregate
the channels back together.

In the below example, we loop over the split recordings, preprocessing each channel group
individually. At the end, we use the :py:func:`~aggregate_channels` function
to combine the separate channel group recordings back together.

.. code-block:: python

preprocessed_recordings = []

# loop over the recordings contained in the dictionary
for chan_group_rec in split_recordings_dict.values():

# Apply the preprocessing steps to the channel group in isolation
shifted_recording = spre.phase_shift(chan_group_rec)

filtered_recording = spre.bandpass_filter(shifted_recording)

referenced_recording = spre.common_reference(filtered_recording)

preprocessed_recordings.append(referenced_recording)

# Combine our preprocessed channel groups back together
combined_preprocessed_recording = aggregate_channels(preprocessed_recordings)

Now, when this recording is used in sorting, plotting, or whenever
calling its :py:func:`~get_traces` method, the data will have been
preprocessed separately per-channel group (then concatenated
back together under the hood).

It is strongly recommended to use the above structure to preprocess by channel group.

.. note::

The splitting and aggregation of channels for preprocessing is flexible.
Under the hood, :py:func:`~aggregate_channels` keeps track of when a recording was split. When
:py:func:`~get_traces` is called, the preprocessing is still performed per-group,
even though the recording is now aggregated.

To ensure data is preprocessed by channel group, the preprocessing step must be
applied separately to each split channel group recording.
For example, the below example will NOT preprocess by channel group:

.. code-block:: python

split_recording = recording.split_by("group")
split_recording_as_list = list(**split_recording.values())
combined_recording = aggregate_channels(split_recording_as_list)

# will NOT preprocess by channel group.
filtered_recording = common_reference(combined_recording)


In general, it is not recommended to apply :py:func:`~aggregate_channels` more than once.
This will slow down :py:func:`~get_traces` calls and may result in unpredictable behaviour.


Sorting a Recording by Channel Group
------------------------------------

We can also sort a recording for each channel group separately. It is not necessary to preprocess
a recording by channel group in order to sort by channel group.

There are two ways to sort a recording by channel group. First, we can split the preprocessed
recording (or, if it was already split during preprocessing as above, skip the :py:func:`~aggregate_channels` step
directly use the :py:func:`~split_recording_dict`).

**Option 1: Manual splitting**

In this example, similar to above we loop over all preprocessed recordings that
are grouped by channel, and apply the sorting separately. We store the
sorting objects in a dictionary for later use.

.. code-block:: python

split_preprocessed_recording = preprocessed_recording.split_by("group")

sortings = {}
for group, sub_recording in split_preprocessed_recording.items():
sorting = run_sorter(
sorter_name='kilosort2',
recording=split_preprocessed_recording,
output_folder=f"folder_KS2_group{group}"
)
sortings[group] = sorting

**Option 2 : Automatic splitting**

Alternatively, SpikeInterface provides a convenience function to sort the recording by property:

.. code-block:: python

aggregate_sorting = run_sorter_by_property(
sorter_name='kilosort2',
recording=preprocessed_recording,
grouping_property='group',
working_folder='working_path'
)
Loading