Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrify RawNet3/Resemblyzer as Keywords & Update READMEs #85

Merged
merged 6 commits into from
Jan 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Amphion provides a comprehensive objective evaluation of the generated audio. Th
- **Energy Modeling**: Energy Root Mean Square Error, Energy Pearson Coefficients, etc.
- **Intelligibility**: Character/Word Error Rate, which can be calculated based on [Whisper](https://github.com/openai/whisper) and more.
- **Spectrogram Distortion**: Frechet Audio Distance (FAD), Mel Cepstral Distortion (MCD), Multi-Resolution STFT Distance (MSTFT), Perceptual Evaluation of Speech Quality (PESQ), Short Time Objective Intelligibility (STOI), etc.
- **Speaker Similarity**: Cosine similarity, which can be calculated based on [RawNet3](https://github.com/Jungjee/RawNet), [WeSpeaker](https://github.com/wenet-e2e/wespeaker), and more.
- **Speaker Similarity**: Cosine similarity, which can be calculated based on [RawNet3](https://github.com/Jungjee/RawNet), [Resemblyzer](https://github.com/resemble-ai/Resemblyzer), [WeSpeaker](https://github.com/wenet-e2e/wespeaker), and more.

### Datasets

Expand Down
25 changes: 7 additions & 18 deletions bins/calc_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@
"v_uv_f1": extract_f1_v_uv,
"cer": extract_cer,
"wer": extract_wer,
"speaker_similarity": extract_speaker_similarity,
"rawnet3_similarity": extract_speaker_similarity,
"resemblyzer_similarity": extract_resemblyzer_similarity,
"fad": extract_fad,
"mcd": extract_mcd,
"mstft": extract_mstft,
Expand All @@ -65,23 +66,11 @@ def calc_metric(ref_dir, deg_dir, dump_dir, metrics, fs=None):
result = defaultdict()

for metric in tqdm(metrics):
if metric == "speaker_similarity":
print("Select the model to use for speaker similarity:")
print("(1) RawNet3")
print("(2) Resemblyzer")
model_choice = input("Enter the number of your choice: ").strip()

if model_choice not in ["1", "2"]:
print("Invalid choice. Exiting the program.")
sys.exit(1)

if model_choice == "1":
result[metric] = str(METRIC_FUNC[metric](ref_dir, deg_dir))
elif model_choice == "2":
similarity_score = extract_resemblyzer_similarity(
deg_dir, ref_dir, dump_dir
)
result[metric] = str(similarity_score)
if metric in ["fad", "rawnet3_similarity"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "fad" here? It seems that the original code does not contain the conditional judegment about fad. Is there a bug in the old code?

Copy link
Collaborator Author

@Merakist Merakist Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When modifying the input-selection part in the old code, the FAD part was mis-deleted. It should be here, as shown in the original commit, at line 64: 9682d0c#diff-4fa833e1c8dd8d05d182f8262a2cc5f727dc72a364db06f8acc5536eff3e6506
Screenshot 2024-01-01 at 22 55 28

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Owing to the firewalled status of huggingface.co on the Aliyun server, it went undetected because prior to testing calc_metrics.py, all parts concerning FAD had to be commented out, or else the script couldn't correctly initialize.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VocodexElysium Please review this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VocodexElysium Please review this.

Basically, if your internet environment does not support getting access to Hugging Face in the terminal, importing FAD will cause errors since it is trying to connect with Hugging Face. You can avoid this by setting the correct VPN environment or just downloading the necessary things yourself and then adjusting the FAD code to import the model on your computer rather than downloading from Hugging Face (I think MingXuan did this successfully). So I don't think removing FAD-related code is necessary since it only involves the internet environment and is solvable.

result[metric] = str(METRIC_FUNC[metric](ref_dir, deg_dir))
continue
elif metric in ["resemblyzer_similarity"]:
result[metric] = str(METRIC_FUNC[metric](deg_dir, ref_dir, dump_dir))
continue

audios_ref = []
Expand Down
87 changes: 68 additions & 19 deletions egs/metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Until now, Amphion Evaluation has supported the following objective metrics:
- Scale Invariant Signal to Noise Ratio (SISNR)
- **Speaker Similarity**:
- Cosine similarity based on [Rawnet3](https://github.com/Jungjee/RawNet)
- Cosine similarity based on [Resemblyzer](https://github.com/resemble-ai/Resemblyzer)
- Cosine similarity based on [WeSpeaker](https://github.com/wenet-e2e/wespeaker) (👨‍💻 developing)

We provide a recipe to demonstrate how to objectively evaluate your generated audios. There are three steps in total:
Expand All @@ -37,7 +38,7 @@ We provide a recipe to demonstrate how to objectively evaluate your generated au

If you want to calculate `RawNet3` based speaker similarity, you need to download the pretrained model first, as illustrated [here](../../pretrained/README.md).

## 2. Aduio Data Preparation
## 2. Audio Data Preparation

Prepare reference audios and generated audios in two folders, the `ref_dir` contains the reference audio and the `gen_dir` contains the generated audio. Here is an example.

Expand Down Expand Up @@ -74,21 +75,69 @@ As for the metrics, an example is provided below:

All currently available metrics keywords are listed below:

| Keys | Description |
| --------------------- | ------------------------------------------ |
| `fpc` | F0 Pearson Coefficients |
| `f0_periodicity_rmse` | F0 Periodicity Root Mean Square Error |
| `f0rmse` | F0 Root Mean Square Error |
| `v_uv_f1` | Voiced/Unvoiced F1 Score |
| `energy_rmse` | Energy Root Mean Square Error |
| `energy_pc` | Energy Pearson Coefficients |
| `cer` | Character Error Rate |
| `wer` | Word Error Rate |
| `speaker_similarity` | Cos Similarity based on RawNet3 |
| `fad` | Frechet Audio Distance |
| `mcd` | Mel Cepstral Distortion |
| `mstft` | Multi-Resolution STFT Distance |
| `pesq` | Perceptual Evaluation of Speech Quality |
| `si_sdr` | Scale Invariant Signal to Distortion Ratio |
| `si_snr` | Scale Invariant Signal to Noise Ratio |
| `stoi` | Short Time Objective Intelligibility |
| Keys | Description |
| ------------------------- | ------------------------------------------ |
| `fpc` | F0 Pearson Coefficients |
| `f0_periodicity_rmse` | F0 Periodicity Root Mean Square Error |
| `f0rmse` | F0 Root Mean Square Error |
| `v_uv_f1` | Voiced/Unvoiced F1 Score |
| `energy_rmse` | Energy Root Mean Square Error |
| `energy_pc` | Energy Pearson Coefficients |
| `cer` | Character Error Rate |
| `wer` | Word Error Rate |
| `rawnet3_similarity` | Cos Similarity based on RawNet3 |
| `resemblyzer_similarity` | Cos Similarity based on Resemblyzer |
| `fad` | Frechet Audio Distance |
| `mcd` | Mel Cepstral Distortion |
| `mstft` | Multi-Resolution STFT Distance |
| `pesq` | Perceptual Evaluation of Speech Quality |
| `si_sdr` | Scale Invariant Signal to Distortion Ratio |
| `si_snr` | Scale Invariant Signal to Noise Ratio |
| `stoi` | Short Time Objective Intelligibility |



## Troubleshooting
### FAD (Using Offline Models)
If your system is unable to access huggingface.co from the terminal, you might run into an error like "OSError: Can't load tokenizer for ...". To work around this, follow these steps to use local models:

1. Download the [bert-base-uncased](https://huggingface.co/bert-base-uncased), [roberta-base](https://huggingface.co/roberta-base), and [facebook/bart-base](https://huggingface.co/facebook/bart-base) models from `huggingface.co`. Ensure that the models are complete and uncorrupted. Place these directories within `Amphion/pretrained`. For a detailed file structure reference, see [This README](../../pretrained/README.md#optional-model-dependencies-for-evaluation) under `Amphion/pretrained`.
2. Inside the `Amphion/pretrained` directory, create a bash script with the content outlined below. This script will automatically update the tokenizer paths used by your system:
```bash
#!/bin/bash

BERT_DIR="bert-base-uncased"
ROBERTA_DIR="roberta-base"
BART_DIR="facebook/bart-base"
PYTHON_SCRIPT="[YOUR ENV PATH]/lib/python3.9/site-packages/laion_clap/training/data.py"

update_tokenizer_path() {
local dir_name=$1
local tokenizer_variable=$2
local full_path

if [ -d "$dir_name" ]; then
full_path=$(realpath "$dir_name")
if [ -f "$PYTHON_SCRIPT" ]; then
sed -i "s|${tokenizer_variable}.from_pretrained(\".*\")|${tokenizer_variable}.from_pretrained(\"$full_path\")|" "$PYTHON_SCRIPT"
echo "Updated ${tokenizer_variable} path to $full_path."
else
echo "Error: The specified Python script does not exist."
exit 1
fi
else
echo "Error: The directory $dir_name does not exist in the current directory."
exit 1
fi
}

update_tokenizer_path "$BERT_DIR" "BertTokenizer"
update_tokenizer_path "$ROBERTA_DIR" "RobertaTokenizer"
update_tokenizer_path "$BART_DIR" "BartTokenizer"

echo "BERT, BART and RoBERTa Python script paths have been updated."

```

3. The script provided is intended to adjust the tokenizer paths in the `data.py` file, found under `/lib/python3.9/site-packages/laion_clap/training/`, within your specific environment. For those utilizing conda, you can determine your environment path by running `conda info --envs`. Then, substitute `[YOUR ENV PATH]` in the script with this path. If your environment is configured differently, you'll need to update the `PYTHON_SCRIPT` variable to correctly point to the `data.py` file.
4. Run the script. If it executes successfully, the tokenizer paths will be updated, allowing them to be loaded locally.
6 changes: 4 additions & 2 deletions evaluation/metrics/similarity/resemblyzer_similarity.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
import torch
import numpy as np
import pandas as pd
import torch.nn.functional as F
from resemblyzer import VoiceEncoder, preprocess_wav
from scipy.spatial.distance import cosine


def load_wavs(directory):
Expand Down Expand Up @@ -40,7 +40,9 @@ def calculate_cosine_similarity(embeddings1, embeddings2, names1, names2):
similarity_info = []
for i, emb1 in enumerate(embeddings1):
for j, emb2 in enumerate(embeddings2):
similarity = 1 - cosine(emb1, emb2)
emb1_tensor = torch.tensor(emb1).unsqueeze(0)
emb2_tensor = torch.tensor(emb2).unsqueeze(0)
similarity = F.cosine_similarity(emb1_tensor, emb2_tensor)
similarity_info.append(
{"Reference": names2[j], "Target": names1[i], "Similarity": similarity}
)
Expand Down
84 changes: 84 additions & 0 deletions pretrained/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,87 @@ Amphion
┃ ┃ ┣ model.pt
```


# (Optional) Model Dependencies for Evaluation
When utilizing Amphion's Evaluation Pipelines, terminals without access to `huggingface.co` may encounter error messages such as "OSError: Can't load tokenizer for ...". To work around this, the dependant models for evaluation can be pre-prepared and stored here, at `Amphion/pretrained`, and follow [this README](../egs/metrics/README.md#troubleshooting) to configure your environment to load local models.

The dependant models of Amphion's evaluation pipeline are as follows (sort alphabetically):

- [Evaluation Pipeline Models Dependency](#optional-model-dependencies-for-evaluation)
- [bert-base-uncased](#bert-base-uncased)
- [facebook/bart-base](#facebookbart-base)
- [roberta-base](#roberta-base)

The instructions about how to download them is displayed as follows.

## bert-base-uncased

To load `bert-base-uncased` locally, follow [this link](https://huggingface.co/bert-base-uncased) to download all files for `bert-base-uncased` model, and store them under `Amphion/pretrained/bert-base-uncased`, conforming to the following file structure tree:

```
Amphion
┣ pretrained
┃ ┣ bert-base-uncased
┃ ┃ ┣ config.json
┃ ┃ ┣ coreml
┃ ┃ ┃ ┣ fill-mask
┃ ┃ ┃ ┣ float32_model.mlpackage
┃ ┃ ┃ ┣ Data
┃ ┃ ┃ ┣ com.apple.CoreML
┃ ┃ ┃ ┣ model.mlmodel
┃ ┃ ┣ flax_model.msgpack
┃ ┃ ┣ LICENSE
┃ ┃ ┣ model.onnx
┃ ┃ ┣ model.safetensors
┃ ┃ ┣ pytorch_model.bin
┃ ┃ ┣ README.md
┃ ┃ ┣ rust_model.ot
┃ ┃ ┣ tf_model.h5
┃ ┃ ┣ tokenizer_config.json
┃ ┃ ┣ tokenizer.json
┃ ┃ ┣ vocab.txt
```

## facebook/bart-base

To load `facebook/bart-base` locally, follow [this link](https://huggingface.co/facebook/bart-base) to download all files for `facebook/bart-base` model, and store them under `Amphion/pretrained/facebook/bart-base`, conforming to the following file structure tree:

```
Amphion
┣ pretrained
┃ ┣ facebook
┃ ┃ ┣ bart-base
┃ ┃ ┃ ┣ config.json
┃ ┃ ┃ ┣ flax_model.msgpack
┃ ┃ ┃ ┣ gitattributes.txt
┃ ┃ ┃ ┣ merges.txt
┃ ┃ ┃ ┣ model.safetensors
┃ ┃ ┃ ┣ pytorch_model.bin
┃ ┃ ┃ ┣ README.txt
┃ ┃ ┃ ┣ rust_model.ot
┃ ┃ ┃ ┣ tf_model.h5
┃ ┃ ┃ ┣ tokenizer.json
┃ ┃ ┃ ┣ vocab.json
```

## roberta-base

To load `roberta-base` locally, follow [this link](https://huggingface.co/roberta-base) to download all files for `roberta-base` model, and store them under `Amphion/pretrained/roberta-base`, conforming to the following file structure tree:

```
Amphion
┣ pretrained
┃ ┣ roberta-base
┃ ┃ ┣ config.json
┃ ┃ ┣ dict.txt
┃ ┃ ┣ flax_model.msgpack
┃ ┃ ┣ gitattributes.txt
┃ ┃ ┣ merges.txt
┃ ┃ ┣ model.safetensors
┃ ┃ ┣ pytorch_model.bin
┃ ┃ ┣ README.txt
┃ ┃ ┣ rust_model.ot
┃ ┃ ┣ tf_model.h5
┃ ┃ ┣ tokenizer.json
┃ ┃ ┣ vocab.json
```
8 changes: 8 additions & 0 deletions pretrained/bert-base-uncased/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-pink)](https://huggingface.co/bert-base-uncased)

# Download

- [Link](https://huggingface.co/bert-base-uncased)
- Model: `bert-base-uncased`
- Download the latest files under `Files and versions` tab.
- Overwrite this file if necessary.
8 changes: 8 additions & 0 deletions pretrained/facebook/bart-base/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-pink)](https://huggingface.co/facebook/bart-base)

# Download

- [Link](https://huggingface.co/facebook/bart-base)
- Model: `facebook/bart-base`
- Download the latest files under `Files and versions` tab.
- Overwrite this file if necessary.
8 changes: 8 additions & 0 deletions pretrained/roberta-base/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-pink)](https://huggingface.co/roberta-base)

# Download

- [Link](https://huggingface.co/roberta-base)
- Model: `roberta-base`
- Download the latest files under `Files and versions` tab.
- Overwrite this file if necessary.