ACDC Version 2022 is a Python 3.7 release of the Animal Call Detection and Classification code powered by Keras/Tensorflow.
While studying the social behavior of animals in a laboratory setting, a wealth of unstructured data is typically generated. These data, such as hours-long video and/or audio recordings, are tedious for humans to analyze manually. ACDC is a project that seeks to solve this problem for audio analysis - helping researchers to train a neural network and then automatically detect and classify animal calls, outputting the results in a convenient format.
To train, the user places training data for discrete call types in the training_data
folder and then instructs ACDC to develop detection and classification models based on that data.
To process recordings, the user places recordings in the recordings
folder and tells ACDC to process them, resulting in files containing call labels being generated and placed in an output folder.
Operation is mainly through a command line interface with numbered options which allows the user to enter a number for the action to perform.
- Removal of lesser-used features
- Fixed issues with newer versions of several packages incl Keras/Tensorflow
- Save to Audacity labels format
- Model save and load debugged
- Installation using Requirements.txt
The full recording is split up into overlapping segments, each a certain length (e.g. 0.5s). Each segment is fed to the multi-class classifier which determines which type of call that segment contains. Since there is a high degree of overlap between the segments, each section of the spectrogram is essentially covered many times. These results are put in a time series, and the "Scanner" class then goes through the raw results, smoothing them, and then finally discarding continuous segments that are less than a certain proportion of the average call length (e.g. if the average phee call is 1s, and a continuous set of segments were labeled phee, but that contiguous set only lasted 0.3s total, it would be discarded). These steps effectively create a "voting" scheme. If there is a false positive in one segment and one segment only, these steps will likely smooth over them or weed them out. Conversely, if there is a false negative in a sea of true positives, it will not disrupt the chain.
- Download the repo and unzip the files in the directory where you want them
- Install Anaconda https://www.anaconda.com/
- Create a new environment with Python 3.7 in Anaconda Navigator
- Click the environment, click Open Terminal (a command line terminal will open)
- In the terminal, navigate to the directory that has the ACDC files (where
acdc.py
is) - Type
pip install -r requirements.txt
and hit enter. Pip should now be installing all the required packages.
To run ACDC, type python acdc.py
. The following menu should appear
Now enter the number for the action you want to perform. Note that you first need data to run these options.
Option 1 (prepare training data) requires having training data in the training_data
folder.
Option 2 (train models) assumes that option 1 has already been done.
Option 3 (process recordings) requires a trained model in the models
folder and a recording in the recordings
folder.
- Put training data in the
training_data
folder. There should be a sub-folder for each class and the name of the sub-folder has to match the classes listed in the variable WINDOW_LENGTHS
in variables.py
. The folders should contain wave files (.wav format, mono, 48kHz, 16 bits/sample) with the target calls, nicely edited to start and stop at the beginning and end of the call. The training samples need to be good quality, clearn, representative examples of what will be encountered in the recordings. There should also be a folder named Noise
with representative samples of noises that are loud enough to cross threshold but do not belong to any of the target classes. Set TRAINING_SEGMENTS_PER_CALL
sufficiently high for data augmentation to take place.
- To run data preparation, enter the corresponding number from in the menu. Output is a file called
acdc.tdata
in the models
folder.
- Once
prepare training data
has been run, there should be a file called acdc.tdata
in the models
folder and models can be trained. Make sure to set TRAINING_EPOCHS
in variables.py
sufficiently high (>10) for the model to optimize.
- To run model training, enter the corresponding number from in the menu. Output is a set of files and sub-folders in
models
representing the trained model.
- Once a model has been trained, the trained model should be in the
models
folder and recordings (.wav format, mono, 48kHz, 16 bits/sample) can be processed. Put wave files for analysis in the recordings
folder.
- To process recordings, enter the corresponding number from in the menu. Results are stored in a new sub-directory in
results
. Sub-directories are named according to the date and time of the run, like this: [YYYYMMDD][HHMMSS][recording filename]. Results are lists of call labels in .csv format and .txt format (tab-delimited Audacity readable) with a row for each call and 1st column start time (seconds), 2nd column end time (seconds) and 3rd column call type (‘Tr’, ‘Tw’, ‘Ph’ or ‘Chi’). The csv and txt files contain the same information.
- An easy way to view the results is by loading the wave file into Audacity https://www.audacityteam.org/ in Spectrogram view, and then do 'File', 'Import', 'Labels...' and select the .txt file with labels.
- The user may want to try out different values for
CONFIDENCE_THRESHOLD
and VOLUME_AMP_MULTIPLE
(both in variables.py
to get a better result. If that does not work, re-training with more samples may be necessary. Finally, to use a model architecture of your own, the current framework can still be useful. You need to edit model.py
to enter the new model.
variables.py
contains constants used in various modules. Some of them are highlighted here because changing their values according to the use's needs can help get better results.
CONFIDENCE_THRESHOLD
This is the value that needs to be exceeded in the the final layer of the model to trigger detection of a call. Lowering this value makes the model more likely to detect something but can lead to more false postives. Raising this value makes the model less likely to detect something but reduces false postitives.
TRAINING_EPOCHS
This determines the number of iterations that the model trains. We have good experience using at least 10 epochs.
WINDOW_LENGTHS = {'Chi': 0.25,'Tr': 0.25,'Ph': 0.40,'Tw': 0.5}
Window lengths in seconds are set for each vocalization type. The names of the calls ‘Chi’, ‘Tr’ ‘Ph’ and ‘Tw’ have to correspond to folder names in the training_data
folder. If different or additional classes need to be trained, this variable needs to change accordingly
TRAINING_SEGMENTS_PER_CALL
This is a target number of segments which determines whether the data needs to be augmented. It makes sense to set this value equal to the class with the most segments so that other classes are augmented and get the same number, removing class imbalance.
VOLUME_AMP_MULTIPLE
This variable determines by how much the data should be amplified. There is a threshold being applied so segments that do not cross the threshold are discarded. Change this value to get the optimal balance between false positives and false negatives.
models
This is where trained models and pre-processed training data are stored
recordings
This is where recordings for analysis (.wav files, mono, 48kHz, 16 bits/sample) are stored.
results
Results of processing a file are stored in this folder. A new sub-directory is created each time a file is processed. Sub-directories are named according to the date and time of the run, like this: [YYYYMMDD][HHMMSS][recording filename]. Results are lists of call labels in .csv format and .txt format (tab-delimited Audacity readable) with a row for each call and 1st column start time (seconds), 2nd column end time (seconds) and 3rd column call type (‘Tr’, ‘Tw’, ‘Ph’ or ‘Chi’). The csv and txt files contain the same information
training_data
Training data for training a new model goes here. There should be a folder for each call type ‘Tr’, ‘Tw’, ‘Ph’, ‘Chi’ and ‘Noise’. Each training sample should be a .wav file stored in the folder corresponding to the call type. The ‘Noise’ folder should contain a representative sampling of noises that are not vocalizations but so occur in the environment where the recordings are done, such as doors opening and closing, cage sounds, et cetera. Very low amplitude background noise does not need to be represented because thresholding already makes sure that gets discarded.
Collaborators
Samvaran Sharma, Karthik Srinivasan, and Rogier Landman
Additional info in paper 'Unobtrusive vocalization recording in freely moving marmosets' (in prep)
This project was developed in collaboration with MIT Brain and Cognitive Sciences (c) 2016-2022.