Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a how to loading data into a Sorting manually #2944

Merged
merged 11 commits into from
Jun 7, 2024

Conversation

zm711
Copy link
Collaborator

@zm711 zm711 commented May 31, 2024

This is just a first draft. I expect a good cleanup!

@JoeZiminski, feel free to comment as well.

@zm711 zm711 requested review from h-mayorquin and chrishalcrow May 31, 2024 08:38
@zm711 zm711 added the documentation Improvements or additions to documentation label May 31, 2024
@zm711
Copy link
Collaborator Author

zm711 commented May 31, 2024

Fixes #2912

@zm711 zm711 linked an issue May 31, 2024 that may be closed by this pull request
@h-mayorquin
Copy link
Collaborator

Can you add this to the how to index to see how it looks and review it?

@zm711
Copy link
Collaborator Author

zm711 commented May 31, 2024

Yes of course!

^^^^^^^^^^^^^^^^^^^^

Finally since SpikeInterface is tightly integrated with the Neo project you can create
a sorting from :code:`Neo.SpikeTrain` objects. See Neo documentation for more information on
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot how to do the auto-link so if someone can just remind me in review we should link to the Neo docs.

Copy link
Collaborator

@chrishalcrow chrishalcrow May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like:

Please read the :doc:`Neo documentation<neo:index>`.

(currently we've got neo and probeinterface registered to be used like this)

Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I am wondering if we could enlist @GaelleChapuis to give this a reading as she was the one who asked about this recently.

New eyes are very helpful for didactic things.

@@ -12,3 +12,4 @@ Guides on how to solve specific, short problems in SpikeInterface. Learn how to.
load_matlab_data
combine_recordings
process_by_channel_group
make_a_sorting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a title of the section "make a sorting" sounds to me like how to make a new sorting extractor programatically.

maybe "load your own sorting data to spikeinterface" or something else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want that for the toctree? The actual title isn't influenced by the file name at all. Are you worried we won't know which one is which in the future? I could definitely change but didn't want to be over verbose just for making the index.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. My intention is that people know what the how to is about from the title alone.

If you are concerned about too verbose maybe something in between like "load your sorting data" or similar?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for me, I agree to change the title itself as I was not 100% sure what Make your own Sorting referred to. For the toctree / .rst filename make_a_sorting I am okay with an abbreviated version of the title.

doc/how_to/make_a_sorting.rst Outdated Show resolved Hide resolved
are typically stored in samples/frames rather than in seconds. So you should input the times
in samples/frames. The sampling_frequency allows for easily switching between samples and seconds.

There are 3 options (along with making a NumpySorting from another sorting which will not be covered here):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the detail in the parenthesis is needed in a how to. I feel we don't need to be thoroug.

With lists of spike trains and spike labels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In this case we need a list or array (or lists of lists for multisegment) of spike times,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this case we need a list or array (or lists of lists for multisegment) of spike times,
In this case we need a list or array (or lists of lists for multisegment) of spike frames,

I think that times_list is a bad argument name as it is prone to confusion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I would think we should say

spike times (in frames) or spike times in frames instead. Saying spike times is common in the field to mean in frames or in seconds so I think it would be more standard in the field to say spike times and then specify the units. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you are in a better position to make that call than me. I trust you in this one.

With a unit dictionary
^^^^^^^^^^^^^^^^^^^^^^

We can also use a dictionary where each unit is a key and its spike times are values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can also use a dictionary where each unit is a key and its spike times are values.
We can also use a dictionary where each unit is a key and a list of spike frames are passed as values.

In the same spirit of time vs frames distinction above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above and then we can decide!

@alejoe91 alejoe91 added the hackathon-24 Contributions during the SpikeInterface Hackathon May 24 label Jun 1, 2024
Co-authored-by: Heberto Mayorquin <[email protected]>
@zm711
Copy link
Collaborator Author

zm711 commented Jun 1, 2024

@GaelleChapuis would be more than welcome to comment as well!

Note to self: add the :doc: sphinx before merge.

Copy link
Collaborator

@chrishalcrow chrishalcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! Thanks Zach!

the spike times (i.e. when the neurons were actually firing) the unit labels (i.e.
who the spikes belong to. Also called cluster ids by some sorters), the unit ids (the unique
set of unit labels) and the sampling_frequency. To make your own :code:`Sorting` object you can
use :code:`NumpySorting`. It is important to note that in SpikeiInterface spike trains are handled internally in samples/frames rather than in seconds and we use the sampling frequency to ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfinished sentence. Think you need to delete "are handled internally in samples/frames rather than in seconds and we use the sampling frequency to..." and then it connects to the next sentence.

.. code-block:: python

from spikeinterface.core import NumpySorting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black formatting would enhance the beauty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black doesn't run on rst unless you have a way. So I tried to mimic it....


my_sorting = NumpySorting.from_unit_dict(units_dict_list={'0': [1,3],
'1': [2,4]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black formatting would enhance the beauty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also we should show an example with multi segment in mind.


# neo_spiketrain is a Neo spiketrain object
my_sorting = NumpySorting.from_neo_spiketrain_list(neo_spiketrain,
sampling_frequency=30_000.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a line, just to wrap up the guide. Something like:

"
Now that you've created a Sorting object, you can combine it with a recording to make a :ref:Sorting Analyzer <sphx_glr_tutorials_core_plot_4_sorting_analyzer.py>, or start visualising by using the :py:func:~spikeinterface.widgets.plot_crosscorrelograms function.
"

But up to you!

@zm711
Copy link
Collaborator Author

zm711 commented Jun 3, 2024

@chrishalcrow thanks for all the feedback. I might get to it today, but otherwise I'm in the recording studio all week so hopefully I can incorporate all your fixes on Friday!

Comment on lines 52 to 55
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],
labels_list = [0,1,0,1],
sampling_frequency = 30_000.0
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use np.array for spike trains.
And more importantly we should have the multi segment cases directly here.

In short

Suggested change
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],
labels_list = [0,1,0,1],
sampling_frequency = 30_000.0
)
my_sorting = NumpySorting.from_times_labels(times_list = [np.array( [1,2,3,4])],
labels_list = [np.array([0,1,0,1])],
sampling_frequency = 30_000.0
)

with a celar exlapnation of the multi segment story.
No ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For np.arrays I agree. That's fine.

I was purposely avoiding the multisegment because people said that is confusing for people working with mono segments. But I can add. Maybe one example lower down? That way we have an easy example and people can shut off their brains if they don't need the multisegment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for the mono segment without list but this must be explicit in the paragraph somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will do. My plan is to incorporate all changes on Friday when I can sit and do it carefully :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also believe that in general is better to explain things in the mono segment and then we can mention the multi segment after. Is a complexity that is probably not necessary for most cases and concepts.

Copy link
Collaborator

@JoeZiminski JoeZiminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @zm711 this is awesome! I did not know about this functionality, its really nice and this is a very useful explaination. I've added some suggestions!

Make your own Sorting
=====================

Why make a :code:`Sorting`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the Sorting link to the API for sorting? I think this is possible in RTD hopefully as so. Maybe it is overkill for every reference, but maybe the first?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, this is not done elsewhere in the docs so maybe an issue / PR for another time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bad at that for RTD. So if you have an idea definitely suggest. Linking is the bane of my existence, but I like it!


Why make a :code:`Sorting`?

The :code:`Sorting` object is one of the core objects within the SpikeInterface library
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is really nice, at the moment it leads with references to the Sorting object, my suggestion would be to start by motivating it from the perspective of the user and their problem. I think the Sorting object is important but it is more a means-to-an-end that interesting in its own right. e.g. (just an example of motivation / order rather than specific content):

SpikeInterface contains pre-build readers for the output of many common sorters. However, what if you have sorting output that is not in a standard format (e.g. old csv file)? If this is the case you can make your own Sorting object to load your data into SpikeInterface. This means you can still easily apply various downstream analyses to your results (e.g. building correlograms or for generating a SortingAnalyzer).

The Sorting object is a core object within SpikeInterface that acts as a convenient way to interface with sorting results, no matter which sorter was used to generate them. At a fundamental level it is a series of spike times and a series of labels for each spike along with some associated metadata. Below, we will show you have to take your existing data and load it as a SpikeInterface Sorting object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something in bold to state how easy this is (based on the really nice examples below, amazing you just need times, labels)!

All you need to load your own sorting output into spike interface is a list of spike times and associated unit IDs.

Making a :code:`Sorting`
------------------------

For most formats the :code:`Sorting` is automatically generated. For example one could do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'formats' -> 'sorting output formats'?


# For kilosort/phy files we can use either reader
ks_sorting = read_kilosort('path/to/folder')
phy_sorting = read_phy('path/to/folder')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just ks_sorting example to keep the example focused? Maybe For example, if one had run sorting using Kilosort, you would load the sorting results into SpikeInterface with:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I prefer focused.

phy_sorting = read_phy('path/to/folder')

This :code:`Sorting` contains important information about your spike trains including
the spike times (i.e. when the neurons were actually firing) the unit labels (i.e.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe list these as bullet points?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would fold the discussion of frames into the definition of spike times. e.g.

The `Sorting object contains important information about your spike trains. You will need to provide the:

  • spike times... Note these must be specified in samples. They will be converted to times under the hood using the provided sampling_frequency
  • Unit its ...
  • samplign frequency ...

are typically stored in samples/frames rather than in seconds. So you should input the times
in samples/frames. The sampling_frequency allows for easily switching between samples and seconds.

There are 3 options (along with making a NumpySorting from another sorting which will not be covered here):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be slightly more specifc here e..g. 'There are three options for how you format the spike times, unit labels and sampling frequency that are passed to SpikeInterface. This can be as a list, a dictionary, or with Neo SpikeTrains. Below we will look at an example of each in turn."

Also, I wonder if there is another way to refer to 'spike times' because it is confusing as they are specified in samples. 'Spike sample indicies?' I'm not sure, spike times rolls of the toungue but it really implies these should be formatted in time units.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the discussion Heberto and I had above. I would argue the field is used to the spike times term so I think we say your spike times in samples to be clearer.

from spikeinterface.core import NumpySorting

# in this case we are making a monosegment sorting
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more realistic times might make this clearer. Maybe above, you could have like "Say you had only four spikes in your dataset, at samples 1000, 12000, 15000, 22000 from two different units. With a sampling_frequency of XXX, the actual spike times when converted under the hood in spike interface would be XXXX."

Then, each below example can re-use the previously exaplined example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I was being lazy :)

# in this case we are making a monosegment sorting
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],
labels_list = [0,1,0,1],
sampling_frequency = 30_000.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this underscore syntax correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is. You can put underscores into any number for spacing :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 cool!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very nice, man!

@zm711
Copy link
Collaborator Author

zm711 commented Jun 6, 2024

I think I got most comments, but another review would be great.

@zm711 zm711 changed the title Make a how to for creating a Sorting Make a how to loading data into a Sorting manually Jun 6, 2024
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK to me. Thanks a lot @zm711 . I am not approving only because you still have as draft.

doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved
doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved
Load Your Own Data into a Sorting
=================================

Why make a :code:`Sorting`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, your why sections are great!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoeZiminski edited it so hats off for his assist.

doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved

* spike times: the peaks of the extracellular potentials expressed in samples/frames these can
be converted to seconds under the hood using the sampling_frequency
* spike labels: the neuron id for each spike, can also be called cluster ids or unit ids
Copy link
Collaborator

@h-mayorquin h-mayorquin Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to myself. I never thought about them this way. I always thought that the units have a label and a spike train but I think I had the "spike dictionary" representation too prominent in my mind. This makes more sense in the context of the spike vector.

doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved
doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved
@zm711 zm711 marked this pull request as ready for review June 6, 2024 16:14
@zm711
Copy link
Collaborator Author

zm711 commented Jun 6, 2024

@chrishalcrow, we await your read!

Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JoeZiminski
Copy link
Collaborator

Hey @zm711 looks great!! 🚀

Copy link
Collaborator

@chrishalcrow chrishalcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Got some beauty requests and a link to update, but otherwise it's great.

doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved
doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved
doc/how_to/load_your_data_into_sorting.rst Show resolved Hide resolved
doc/how_to/load_your_data_into_sorting.rst Show resolved Hide resolved
doc/how_to/load_your_data_into_sorting.rst Outdated Show resolved Hide resolved
@zm711 zm711 requested a review from chrishalcrow June 7, 2024 11:03
@zm711
Copy link
Collaborator Author

zm711 commented Jun 7, 2024

@h-mayorquin just in case you didn't see this the profile imports on Mac took longer than 1.65 seconds. I re-reran and it went away. So this test has some randomness--I would assume based on the runner that picks it up, but I don't know.

@samuelgarcia samuelgarcia merged commit 2a4bc57 into SpikeInterface:main Jun 7, 2024
11 checks passed
@zm711 zm711 deleted the make-a-sorting-doc branch June 7, 2024 13:48
@h-mayorquin
Copy link
Collaborator

Thanks @zm711 . I was probably too ambitious with the test. I will set it higher : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation hackathon-24 Contributions during the SpikeInterface Hackathon May 24
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "how to" for how to load your own data as a sorting object
6 participants