- Studying the structure of population neural dynamics can help identify brain states and reveal how the brain encodes information about the world
- "Population neural activity" is a very-high dimensional vector, typically thought of as the firing rate of each neuron
- Due to connectivity/ learning/ etc. the actual firing rate vectors don't fully span the high dimensional space but occur on some lower-dimensional manifold
- Interpreting this data involves embedding the manifold of observed population activity into a lower-dimensional space
- Most existing methods of embedding the dynamics (e.g. CEBRA, UMAP) take sorted spike firing rate vectors as input
- This requires spike-sorting the electrical series data, which can be time-intensive and exclude low-amplitude events
- C3PO provides a 'clusterless' method to identify latent neural states from population activity
The first step of the pipeline is to extract waveform mark events from electrophysiology data. This can be down with threshold detection to identify event times (
This gives us the set of observations $X = {(t_i,W_i)}|{i=1}^{n{obs}}$. Since these are ordered observations, this can also be described as $X = {(\Delta t_i,W_i)}|{i=1}^{n{obs}}$ where
With increasing use of high-density electrode probes for electrophysiology, the waveform feature
This motivates us to find a lower-dimensional embedding of the waveforms
The implemented loss function does not ultimately depend on the functional form of
The embedded waveform feature
We can functionally define
Similar to Contrastive Predictive Coding (CPC), our definition of a "good" context embedding is based on its ability to predictively identify future state of the system. Explictly we need to define a function
Again, the final loss term is independent of the functional form of the parameterization
To summarize the previous architechture, we have embedded our sequence of waveform observations $X = {(\Delta t_i,W_i)}|{i=1}^{n{obs}}$ independently into a sequence of events
We can now define the likelihood of our observations. Qualitatively, we can define the probability of each observation as a spike with the given waveform
-
n_neg_samples
:- Low values: less specific prediction required. Need to know when rates are high, but less sensitive to false positives.
- High values: requires more precision when predicting when a unit fires. Loss term is much more punished for predicting high rates at innappropriate times
-
batch size
:- Changes what you're loss is contrasting against:
- High values: Requires that spikes from a trial are different from different states and differnet trials. Less pressure for contrast within-trial
- Low Values: More of the contrastive loss comes from within-trial spikes. Model has to learn more about difference over time in a trial
-
- annealing of
n_neg_samples
:- Start with low value (e.g. 8) to allow model to learn general rates and embeddings. Allow to train until improvement <1%
- Double
n_neg_samples
and train until stable - Repeat up to max value (128)
- Preliminary:
batch_size
- Run protocol above once with
batch_size
=64 to learn general structure - Repeat protocol with
batch_size
=8 for refiniment of within trial changes - TODO: verify whether the initial stage is necessary.
- Run protocol above once with
- annealing of