Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation of preprocessing and sorting split by channel group. #2316

Merged

Conversation

JoeZiminski
Copy link
Collaborator

@JoeZiminski JoeZiminski commented Dec 8, 2023

This PR adds a 'How to' page on preprocessing and sorting when splitting the recording into channel groups. I could not get the formatting (e.g. inclusion of notes, warning sections) to work with the sphinx-gallery approach as described in this example. I am not 100% sure why, I thought to have the docuemntation reviewed and the best place for it descided before persevering with this.

I also had some general questions

  • Does probeinterface have a way to show only certain channel groups? If possible wanted to also include a plot of the different channel groups, highlighted by color or something.
  • Sphinx is pinned to an older version, I can see why as sphinx version changes are a nightmare. Nonetheless the newer versions have some cool features (like dropdowns, currently would would be a dropdown is included as a note at the end). At least, I think the version is why this is not working. Is it worth considering unpinning this dependency?

@JoeZiminski JoeZiminski changed the title Split by recording first commit. Add documentatino of preprocessing and sorting split by channel group. Dec 8, 2023
@zm711
Copy link
Collaborator

zm711 commented Dec 8, 2023

So if the goal of this is to be a full tutorial run locally and converted to .rst for the docs then the desired flow is to place here as a .py file that can then be converted later (typically real examples using NPs etc). This used to be a more strict rule. Super light examples (simulated data or mearec currently) would go to the module gallery in the appropriate section. I honestly am neutral on the location of the tutorials, but just to FYI you as you work on this (in case it is decided it needs to move and be rewritten in .py style).

And thanks! I think the growing number of multishank datasets really requires some documentation for SI users to understand how to do this with the machinery.

@JoeZiminski JoeZiminski changed the title Add documentatino of preprocessing and sorting split by channel group. Add documentation of preprocessing and sorting split by channel group. Dec 8, 2023
@zm711 zm711 added the documentation Improvements or additions to documentation label Dec 8, 2023
@JoeZiminski
Copy link
Collaborator Author

Thanks @zm711! It is pretty light with toy example data so I think 'Module Example Gallery' would be the right place, maybe under 'core' as it contains information on both preprocessing and sorting?

@JoeZiminski JoeZiminski marked this pull request as ready for review December 11, 2023 15:31
@alejoe91
Copy link
Member

Hi @JoeZiminski

Thanks so much for this!! @zm711 honestly I think that the gallery is mainly there as legacy and we will remove it in a future refactoring of the docs. The modules documentation is already the place for detailed information about each module, and IMO the gallery is just redundant (plus hard to maintain!). What do you think?

In that regard, I'd keep this in the "How to" section!

@zm711
Copy link
Collaborator

zm711 commented Dec 12, 2023

I was always in support of the How to. It just makes mental sense to me whereas module gallery was always confusing to me as a concept.

I think getting rid of them would be okay (they do break the most--although I will say they did help me catch a bug in one of the widgets one time :) ). My biggest problem with them is that they currently (almost) all require datalad. So if they stick around I really think they need to switch to being a simulated dataset so that anyone can run them rather than just people with the fortitude to get datalad working.

@samuelgarcia
Copy link
Member

I think we need to keep the gallery which auto generated as long it is fast enought.
The howto section was heavy stuuf that we generate locally but we keep a jupytext file to be able to regenrate when necessary.

@alejoe91
Copy link
Member

@JoeZiminski did you intend to include this: doc/sg_execution_times.rst? What is it?

@samuelgarcia
Copy link
Member

Hi @JoeZiminski
When we do a howto doc which is generated from a notebook. (I think this is your case). Then , we need to keep the notebook inside the source code in case we when to mody it one day.
In that case we decided to use jupytext to put the notebook in examples/how_to/*.py to work on it later.
We push the py but not the ipynb and finally convert it to rst and copy/paste the rst into the doc/how_to/*.rst
This is important to able to re-run the notebook without keeping the ipynb.
Here you only push the converted RST.
Have a look to this https://github.com/SpikeInterface/spikeinterface/tree/main/examples/how_to

Maybe I am wrong and you directly wrote the rst. Then it is OK.

In short, we have 3 types of documentation:

  1. pure RST which is at the moment more or less modules/*. If we modify the API this files are not chnaged
  2. generated by examples/*.py and rendered automatically as html/py/ipynb in the sphinx-gallery
  3. locally run notebook and manually transformed into rst.

For me:
case 2 is very important even if Alessio not like the actual content. We can improve the content.
case 3 should be avoid when the generation is super fast and we should put it in case 2 but when it imply long computation or heavy data we do not have choice.

@zm711
Copy link
Collaborator

zm711 commented Dec 13, 2023

Maybe I should open an issue where we can discuss the rules for example generation and then when we have the official rules laid out I can open a PR to add it to the development portion of the docs so we can have it laid out in stone how we want to organize examples. Does that sound like a plan? I think the structure isn't clear currently so we keep rehashing where examples should go.

@samuelgarcia
Copy link
Member

I am agree that the structure is not perfect and is should be orthogonal to this case 1 but the sphinx-gallary need to be somehow a section in the main toctree.
lets put this on an issue and maybe lets wirte it in the doc itself!!

@JoeZiminski
Copy link
Collaborator Author

JoeZiminski commented Dec 14, 2023

Thanks everyone!

For now I did not make the documentation in a .py file because I could not get some cool features like notes / warnings to work. Also, dropdowns are not working. I think this is because of the sphinx-version but am not sure. I was going to propose updating the sphinx version, however that is outside the scope of this PR.

I will redo this page in the standard format as described above (.py > .ipynb > .rst) and can move discussion of the docs to #2327 and add those features back later. Cheers!

@h-mayorquin
Copy link
Collaborator

This looks great.

I second @alejoe91:

@JoeZiminski did you intend to include this: doc/sg_execution_times.rst? What is it?

Out of curiosity, what do you mean here by unpredictable behavior:
image

Also you say:

I will redo this page in the standard format as described above (.py > .ipynb > .rst)

Why? I think the rst works better than .py and Sam said.

Maybe I am wrong and you directly wrote the rst. Then it is OK.

doc/how_to/process_by_channel_group.rst Outdated Show resolved Hide resolved
)


Further notes on preprocessing by channel group
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this section a bit confusing. Is the goal of it eliminating the confusion that people might have of thinking that when they apply a pre-processing step to aggregate recording it will still apply per-group?

If so, I will start the session by stating the possible confusion, then saying that this is not correct and then your illustrative example.

I feel that the first paragraph is trying to add some context but I don't think that's necessary as this is at the end of a tutorial that should have make this all clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @h-mayorquin this is a very good point, it is strange to have a seciton at the end re-visiting the entire article in more detail. I've dropped the second half of this section and integrated the rest into a note within the page body. Let me know what you think!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-read the current one and it looks great to me!

import numpy as np

# Create a toy 384 channel recording with 4 shanks (each shank contain 96 channels)
recording, _ = se.toy_example(duration=[1.00], num_segments=1, num_channels=384)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question:
When do we use toy_example and we do we use generate_recording or generate_ground_truth_recording. I personally would prefer to only use the latter but maybe it does not matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on this and will defer to @alejoe91 @samuelgarcia

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for docs toy_example is ok! It's actually easier to intepret

doc/how_to/process_by_channel_group.rst Show resolved Hide resolved
@alejoe91 alejoe91 added this to the 0.100.0 milestone Jan 9, 2024
@alejoe91
Copy link
Member

@JoeZiminski maybe this was buried in the discussion!

@JoeZiminski did you intend to include this: doc/sg_execution_times.rst? What is it?

Did you include on purpose? If so, should it be referenced somewhere? I would remove it for now!

@alejoe91
Copy link
Member

@JoeZiminski just pinging you on this! We are planning a release on Friday, so it would be great if you can tackle the remaining points by then :)

@alejoe91 alejoe91 removed this from the 0.100.0 milestone Feb 5, 2024
@JoeZiminski
Copy link
Collaborator Author

@alejoe91 @h-mayorquin @samuelgarcia my apologies for the delay!

  • the execution times was a mistake apologies, somehow got created when building the docs
  • the warning section on repeatedly applying aggregate_channels is a little unspecific. Once for a pipeline I tried to perform the split-aggregate separately for each preprocessing step (for non-important reasons). However, after this get_traces() became extremely slow. This was resolved by only splitting / aggreating once and performing the whole preprocessing within this split/aggregate as suggested in the docs. So I thought it was worth mentioning this, but did not look into it any further. With the restructuring this is now a sentence in the 'note' section rather than a big warning, so there is less emphasis on this point now. Do you think it is still worth including? It would be interesting to look further into this slowing behaviour but it seems non-critical and is a result of nonstandard use of the function so is probably low priority.

Otherwise from my end I think this is good to go, thanks a lot for your feedback!

Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one super tiny typo :)

doc/how_to/process_by_channel_group.rst Outdated Show resolved Hide resolved
@samuelgarcia samuelgarcia merged commit fe937a1 into SpikeInterface:main Mar 8, 2024
9 of 10 checks passed
@JoeZiminski
Copy link
Collaborator Author

Thanks all for your reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants