This repository contains code that allows to reproduce all experiments performed in the paper "Adding User Feedback To Enhance CB-Whisper".
Create a conda environment and activate it
conda create -n biasing-whisper python=3.10
conda activate biasing-whisper
Install ffmpeg and the necessary requirements
conda install 'ffmpeg<5'
pip install -r requirements.txt
The following bash scripts will create compatible folder structures for the scripts present in this repository, as well as do all the pre-processing necessary to train and evaluate the KWS classifier.
To build the Aishell-KWS dataset, download the data_aishel.tgz
file from here and place it on the directory where the dataset will be built.
In the project directory, do as follows
cd datasets/aishell/
Activate the conda environment
conda activate kws
And run the bash script
bash build.sh
You will be asked to provide the path to the tgz
file. It will take several hours, so make sure you open some tmux
session and let it run uninterruptedly.
To build the MLS-KWS dataset, download the zip files from here for the English, German, French, Spanish, Portuguese and Polish languages. Everything else is equivalent to what was done with the Aishell-KWS dataset.
To build the ACL6060 dataset, download the zip file from here. Everything else is equivalent to what was done with the Aishell-KWS dataset.
The CNN classifier for KWS was inspired in the one originally proposed in CB-Whisper. This repository contains additional features that allow to reproduce the experiments on KWS performed in the afforementioned paper:
- Training using features derived from either TTS-generated or natural-speech audios for the keywords, or a mixture of both;
- DANN and DANNCE implementation;
- Validation of checkpoints using more than one dataset (for domain generalization analysis).
In the project directory, do as follows
cd src/
And run the following command
python3 run_CLI.py fit --config configs/train.yaml
In the train.yaml
file, you will be able to set different hyperparameters, the paths to the dataset folders, which datasets to validate the model every epoch, the logger, and other training details. Important settings that must be introduced are capitalized and between square brackets.
The following can be used to evaluate the precision, recall and F1 scores of the KWS classifier on the test sets of the different datasets.
In the project directory, do as follows
cd src/
And run the following command
python3 kws.py test --config configs/kws-***.yaml
For ease of use, there is one config yaml
file per dataset. Do not forget to set the paths to the dataset folders and the given checkpoint to evaluate. Important settings that must be introduced are capitalized and between square brackets.
The following can be used to evaluate the entity recall of the CB-Whisper model on the test sets of the different datasets, using the KWS classifiers developed with these scripts. These results were not reported in the paper "Adding User Feedback To Enhance CB-Whisper". This version of CB-Whisper uses a wrapped version of Huggingface's WhisperForConditionalGeneration
, also known as PBAWhisper, that can perform longform transcription jointly with keyword spotting on the go.
In the project directory, do as follows
cd src/
And run the following command
python3 cb-whisper.py test --config configs/cb-whisper-***.yaml
For ease of use, there is one config yaml
file per dataset. Do not forget to set the paths to the dataset folders and the given checkpoint to evaluate. Important settings that must be introduced are capitalized and between square brackets.
See the LICENSE.md file for details.
If you use any of the resources in this repository, please cite the following paper:
Citation will be added in the future.