Skip to content

Data Format and Configuration Parameters

catubc edited this page Mar 25, 2020 · 3 revisions

Raw Data Format

YASS takes as input 2 files: the raw data file and a 2D geometry file. The raw input (generally 2-byte integer format, i.e. 'int16' in python) has column-structure:

Channel_1_Time_1, Channel_2_Time_1,... Channel_C_Time_t, Channel_C+1_Time_t...

Geometry

The geometry file is an ASCII txt file containing the X and Y coordinates of each electrode (in the same order as the raw binary file) with no comma separation:

-7.350000000000000000e+02 4.500000000000000000e+02

-6.750000000000000000e+02 4.500000000000000000e+02

-6.150000000000000000e+02 4.500000000000000000e+02

...

Additional Parameters

The config.yaml file contains several sections and it can be edited either using a text editor (recommended) or using the YASS GUI (limited changes only).

File locations

data:

  `root_folder: [insert folder location of install]`

  `# recordings filename (must be a binary file), details about the recordings`
  `# are specified in the recordings section`
  `recordings: [insert binary filename]`
  
  `# channel geometry filename , supports txt (one x, y pair per line,`
  `# separated by spaces) or a npy file with shape (n_channels, 2),`
  `# where every row contains a x, y pair. see yass.geometry.parse for details`
  `geometry: [insert name of geometry text file]`

Sample rates and # of channels

recordings:

  `# precision of the recording – must be a valid numpy dtype`
  `dtype: int16 [only int16 is supported currently]`
  
  `# recording rate (in Hz)`
  `sampling_rate: [sampling rate] `
  
  `# number of channels`
  `n_channels: [number of channels]`

Spatial radius of local features and duration of spike

  # channels spatial radius to consider them neighbors, see
  # yass.geometry.find_channel_neighbors for details
  spatial_radius: [distance between channels + 10]
  
  # temporal length of waveforms in ms. It must capture
  # the full shape of waveforms but longer means slower
  spike_size_ms: 3 [3ms is default]

Template clustering using 5minutes of data

  `# chunks to run clustering on (in seconds)`
  `# leave blank to run clustering on entire dataset [not recommended]`
  `clustering_chunk: [0, 300]  # default clustering is run on first 5mins of data`
  
  `# chunks to run final deconv on (in seconds)`
  `# leave blank to run it on full`
  `final_deconv_chunk:         # default leave blank`

Multi core CPU and GPU processing.

resources:

  `# CPU multi-processing flag: 1 = use multiple cores`
  `multi_processing: 1`
  
  `# Number of CPU cores to use; recommended to set to # of physical cores available on CPU`
  `n_processors: 16`
  
  `# Length of processing chunks; if memory issues arise, decrease value`
  `n_sec_chunk: 10`
  
  `# number of GPUs to use [multi-gpu options being currently implemented]`
  `n_gpu_processors: 1`
  
  `# n_sec_chunk for gpu detection; if memory issues arise, decrease value`
  `n_sec_chunk_gpu_detect: 0.5`
  
  `# n_sec_chunk for gpu deconvolution; if memory issues arise, decrease value`
  `n_sec_chunk_gpu_deconv: 5`

Neural Network training:

detect:

`# model name, it can be an absolute path to a model `
`# (e.g. /path/to/my/model.ckpt) or a name of one of models included in yass `
`# (e.g. detect.pt). Check /src/yass/assets/nn_models for the list of available`
`# models.`
`filename: detect.pt  `

`n_filters: [16, 8, 8] `

denoise:

`# model name, same rule as detect filename`
`filename: denoise.pt`
`n_filters: [16, 8, 4]`
`filter_sizes : [5, 11, 21]`

training:

`# input spike train file name. It must be a numpy file of an arroay of n x 2.`
`# n is the number of spikes, the first column is the spike times `
`# (center of spikes), the second column is the unit id.`
`# if you want to run yass without nn to make spike train, leave it empty`

`input_spike_train_filname:`
`# temporal length of spike to go into nn`
`# leave it empty if not training`
`spike_size_ms:`