-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make a how to loading data into a Sorting manually #2944
Conversation
Fixes #2912 |
Can you add this to the how to index to see how it looks and review it? |
Yes of course! |
doc/how_to/make_a_sorting.rst
Outdated
^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Finally since SpikeInterface is tightly integrated with the Neo project you can create | ||
a sorting from :code:`Neo.SpikeTrain` objects. See Neo documentation for more information on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot how to do the auto-link so if someone can just remind me in review we should link to the Neo docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works like:
Please read the :doc:`Neo documentation<neo:index>`.
(currently we've got neo and probeinterface registered to be used like this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I am wondering if we could enlist @GaelleChapuis to give this a reading as she was the one who asked about this recently.
New eyes are very helpful for didactic things.
doc/how_to/index.rst
Outdated
@@ -12,3 +12,4 @@ Guides on how to solve specific, short problems in SpikeInterface. Learn how to. | |||
load_matlab_data | |||
combine_recordings | |||
process_by_channel_group | |||
make_a_sorting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a title of the section "make a sorting" sounds to me like how to make a new sorting extractor programatically.
maybe "load your own sorting data to spikeinterface" or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want that for the toctree? The actual title isn't influenced by the file name at all. Are you worried we won't know which one is which in the future? I could definitely change but didn't want to be over verbose just for making the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. My intention is that people know what the how to is about from the title alone.
If you are concerned about too verbose maybe something in between like "load your sorting data" or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for me, I agree to change the title itself as I was not 100% sure what Make your own Sorting
referred to. For the toctree / .rst filename make_a_sorting
I am okay with an abbreviated version of the title.
doc/how_to/make_a_sorting.rst
Outdated
are typically stored in samples/frames rather than in seconds. So you should input the times | ||
in samples/frames. The sampling_frequency allows for easily switching between samples and seconds. | ||
|
||
There are 3 options (along with making a NumpySorting from another sorting which will not be covered here): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the detail in the parenthesis is needed in a how to. I feel we don't need to be thoroug.
doc/how_to/make_a_sorting.rst
Outdated
With lists of spike trains and spike labels | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In this case we need a list or array (or lists of lists for multisegment) of spike times, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we need a list or array (or lists of lists for multisegment) of spike times, | |
In this case we need a list or array (or lists of lists for multisegment) of spike frames, |
I think that times_list
is a bad argument name as it is prone to confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I would think we should say
spike times (in frames) or spike times in frames instead. Saying spike times is common in the field to mean in frames or in seconds so I think it would be more standard in the field to say spike times and then specify the units. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you are in a better position to make that call than me. I trust you in this one.
doc/how_to/make_a_sorting.rst
Outdated
With a unit dictionary | ||
^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
We can also use a dictionary where each unit is a key and its spike times are values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also use a dictionary where each unit is a key and its spike times are values. | |
We can also use a dictionary where each unit is a key and a list of spike frames are passed as values. |
In the same spirit of time vs frames distinction above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above and then we can decide!
Co-authored-by: Heberto Mayorquin <[email protected]>
@GaelleChapuis would be more than welcome to comment as well! Note to self: add the :doc: sphinx before merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! Thanks Zach!
doc/how_to/make_a_sorting.rst
Outdated
the spike times (i.e. when the neurons were actually firing) the unit labels (i.e. | ||
who the spikes belong to. Also called cluster ids by some sorters), the unit ids (the unique | ||
set of unit labels) and the sampling_frequency. To make your own :code:`Sorting` object you can | ||
use :code:`NumpySorting`. It is important to note that in SpikeiInterface spike trains are handled internally in samples/frames rather than in seconds and we use the sampling frequency to ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfinished sentence. Think you need to delete "are handled internally in samples/frames rather than in seconds and we use the sampling frequency to..." and then it connects to the next sentence.
doc/how_to/make_a_sorting.rst
Outdated
.. code-block:: python | ||
|
||
from spikeinterface.core import NumpySorting | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Black formatting would enhance the beauty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
black doesn't run on rst unless you have a way. So I tried to mimic it....
doc/how_to/make_a_sorting.rst
Outdated
|
||
my_sorting = NumpySorting.from_unit_dict(units_dict_list={'0': [1,3], | ||
'1': [2,4] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Black formatting would enhance the beauty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here also we should show an example with multi segment in mind.
doc/how_to/make_a_sorting.rst
Outdated
|
||
# neo_spiketrain is a Neo spiketrain object | ||
my_sorting = NumpySorting.from_neo_spiketrain_list(neo_spiketrain, | ||
sampling_frequency=30_000.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a line, just to wrap up the guide. Something like:
"
Now that you've created a Sorting object, you can combine it with a recording to make a :ref:Sorting Analyzer <sphx_glr_tutorials_core_plot_4_sorting_analyzer.py>
, or start visualising by using the :py:func:~spikeinterface.widgets.plot_crosscorrelograms
function.
"
But up to you!
@chrishalcrow thanks for all the feedback. I might get to it today, but otherwise I'm in the recording studio all week so hopefully I can incorporate all your fixes on Friday! |
doc/how_to/make_a_sorting.rst
Outdated
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], | ||
labels_list = [0,1,0,1], | ||
sampling_frequency = 30_000.0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use np.array for spike trains.
And more importantly we should have the multi segment cases directly here.
In short
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], | |
labels_list = [0,1,0,1], | |
sampling_frequency = 30_000.0 | |
) | |
my_sorting = NumpySorting.from_times_labels(times_list = [np.array( [1,2,3,4])], | |
labels_list = [np.array([0,1,0,1])], | |
sampling_frequency = 30_000.0 | |
) |
with a celar exlapnation of the multi segment story.
No ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For np.arrays I agree. That's fine.
I was purposely avoiding the multisegment because people said that is confusing for people working with mono segments. But I can add. Maybe one example lower down? That way we have an easy example and people can shut off their brains if they don't need the multisegment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for the mono segment without list but this must be explicit in the paragraph somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep will do. My plan is to incorporate all changes on Friday when I can sit and do it carefully :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also believe that in general is better to explain things in the mono segment and then we can mention the multi segment after. Is a complexity that is probably not necessary for most cases and concepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @zm711 this is awesome! I did not know about this functionality, its really nice and this is a very useful explaination. I've added some suggestions!
doc/how_to/make_a_sorting.rst
Outdated
Make your own Sorting | ||
===================== | ||
|
||
Why make a :code:`Sorting`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the Sorting
link to the API for sorting? I think this is possible in RTD hopefully as so. Maybe it is overkill for every reference, but maybe the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, this is not done elsewhere in the docs so maybe an issue / PR for another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bad at that for RTD. So if you have an idea definitely suggest. Linking is the bane of my existence, but I like it!
doc/how_to/make_a_sorting.rst
Outdated
|
||
Why make a :code:`Sorting`? | ||
|
||
The :code:`Sorting` object is one of the core objects within the SpikeInterface library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is really nice, at the moment it leads with references to the Sorting
object, my suggestion would be to start by motivating it from the perspective of the user and their problem. I think the Sorting
object is important but it is more a means-to-an-end that interesting in its own right. e.g. (just an example of motivation / order rather than specific content):
SpikeInterface contains pre-build readers for the output of many common sorters. However, what if you have sorting output that is not in a standard format (e.g. old csv file)? If this is the case you can make your own Sorting
object to load your data into SpikeInterface. This means you can still easily apply various downstream analyses to your results (e.g. building correlograms or for generating a SortingAnalyzer
).
The Sorting
object is a core object within SpikeInterface that acts as a convenient way to interface with sorting results, no matter which sorter was used to generate them. At a fundamental level it is a series of spike times and a series of labels for each spike along with some associated metadata. Below, we will show you have to take your existing data and load it as a SpikeInterface Sorting
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something in bold to state how easy this is (based on the really nice examples below, amazing you just need times, labels)!
All you need to load your own sorting output into spike interface is a list of spike times and associated unit IDs.
doc/how_to/make_a_sorting.rst
Outdated
Making a :code:`Sorting` | ||
------------------------ | ||
|
||
For most formats the :code:`Sorting` is automatically generated. For example one could do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'formats' -> 'sorting output formats'?
doc/how_to/make_a_sorting.rst
Outdated
|
||
# For kilosort/phy files we can use either reader | ||
ks_sorting = read_kilosort('path/to/folder') | ||
phy_sorting = read_phy('path/to/folder') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just ks_sorting
example to keep the example focused? Maybe For example, if one had run sorting using Kilosort, you would load the sorting results into SpikeInterface with:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I prefer focused.
doc/how_to/make_a_sorting.rst
Outdated
phy_sorting = read_phy('path/to/folder') | ||
|
||
This :code:`Sorting` contains important information about your spike trains including | ||
the spike times (i.e. when the neurons were actually firing) the unit labels (i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe list these as bullet points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would fold the discussion of frames into the definition of spike times. e.g.
The `Sorting object contains important information about your spike trains. You will need to provide the:
- spike times... Note these must be specified in samples. They will be converted to times under the hood using the provided sampling_frequency
- Unit its ...
- samplign frequency ...
doc/how_to/make_a_sorting.rst
Outdated
are typically stored in samples/frames rather than in seconds. So you should input the times | ||
in samples/frames. The sampling_frequency allows for easily switching between samples and seconds. | ||
|
||
There are 3 options (along with making a NumpySorting from another sorting which will not be covered here): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be slightly more specifc here e..g. 'There are three options for how you format the spike times, unit labels and sampling frequency that are passed to SpikeInterface. This can be as a list, a dictionary, or with Neo SpikeTrains. Below we will look at an example of each in turn."
Also, I wonder if there is another way to refer to 'spike times' because it is confusing as they are specified in samples. 'Spike sample indicies?' I'm not sure, spike times rolls of the toungue but it really implies these should be formatted in time units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the discussion Heberto and I had above. I would argue the field is used to the spike times term so I think we say your spike times in samples to be clearer.
doc/how_to/make_a_sorting.rst
Outdated
from spikeinterface.core import NumpySorting | ||
|
||
# in this case we are making a monosegment sorting | ||
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more realistic times might make this clearer. Maybe above, you could have like "Say you had only four spikes in your dataset, at samples 1000, 12000, 15000, 22000 from two different units. With a sampling_frequency of XXX, the actual spike times when converted under the hood in spike interface would be XXXX."
Then, each below example can re-use the previously exaplined example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I was being lazy :)
doc/how_to/make_a_sorting.rst
Outdated
# in this case we are making a monosegment sorting | ||
my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], | ||
labels_list = [0,1,0,1], | ||
sampling_frequency = 30_000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this underscore syntax correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is. You can put underscores into any number for spacing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯 cool!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very nice, man!
I think I got most comments, but another review would be great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK to me. Thanks a lot @zm711 . I am not approving only because you still have as draft.
Load Your Own Data into a Sorting | ||
================================= | ||
|
||
Why make a :code:`Sorting`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As usual, your why sections are great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeZiminski edited it so hats off for his assist.
|
||
* spike times: the peaks of the extracellular potentials expressed in samples/frames these can | ||
be converted to seconds under the hood using the sampling_frequency | ||
* spike labels: the neuron id for each spike, can also be called cluster ids or unit ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to myself. I never thought about them this way. I always thought that the units have a label and a spike train but I think I had the "spike dictionary" representation too prominent in my mind. This makes more sense in the context of the spike vector.
Co-authored-by: Heberto Mayorquin <[email protected]>
Co-authored-by: Heberto Mayorquin <[email protected]>
@chrishalcrow, we await your read! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hey @zm711 looks great!! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! Got some beauty requests and a link to update, but otherwise it's great.
@h-mayorquin just in case you didn't see this the profile imports on Mac took longer than 1.65 seconds. I re-reran and it went away. So this test has some randomness--I would assume based on the runner that picks it up, but I don't know. |
Thanks @zm711 . I was probably too ambitious with the test. I will set it higher : ) |
This is just a first draft. I expect a good cleanup!
@JoeZiminski, feel free to comment as well.