-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
85 changed files
with
708 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,14 @@ | ||
*.wav | ||
/test/**/ | ||
!/test/sample1_16k.wav | ||
!/test/sample2_22k.wav | ||
!/test/sample3_48k.wav | ||
!/test/sample4_16k.wav | ||
!/test/sample5_16k.wav | ||
!/test/sample6_48k.wav | ||
!/test/sample7_16k.wav | ||
!/test/sample8_16k.wav | ||
!/test/sample9_48k.wav | ||
!/test/sample10_16k.wav | ||
/AudCodec/test/**/ | ||
!/AudCodec/test/sample1_16k.wav | ||
!/AudCodec/test/sample2_22k.wav | ||
!/AudCodec/test/sample3_48k.wav | ||
!/AudCodec/test/sample4_16k.wav | ||
!/AudCodec/test/sample5_16k.wav | ||
!/AudCodec/test/sample6_48k.wav | ||
!/AudCodec/test/sample7_16k.wav | ||
!/AudCodec/test/sample8_16k.wav | ||
!/AudCodec/test/sample9_48k.wav | ||
!/AudCodec/test/sample10_16k.wav | ||
|
||
__pycache__/ |
File renamed without changes.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
base_codec/descript_audio_codec.py → AudCodec/base_codec/descript_audio_codec.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
2 changes: 1 addition & 1 deletion
2
base_codec/speech_tokenizer.py → AudCodec/base_codec/speech_tokenizer.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
2 changes: 1 addition & 1 deletion
2
codec/academicodec_hifi_16k_320d.py → AudCodec/codec/academicodec_hifi_16k_320d.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...c/academicodec_hifi_16k_320d_large_uni.py → ...c/academicodec_hifi_16k_320d_large_uni.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/academicodec_hifi_24k_320d.py → AudCodec/codec/academicodec_hifi_24k_320d.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/audiodec_24k_320d.py → AudCodec/codec/audiodec_24k_320d.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
from base_codec.audiodec import BaseCodec | ||
from AudCodec.base_codec.audiodec import BaseCodec | ||
import nlp2 | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/encodec_24k_12bps.py → AudCodec/codec/encodec_24k_12bps.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/encodec_24k_1_5bps.py → AudCodec/codec/encodec_24k_1_5bps.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/encodec_24k_24bps.py → AudCodec/codec/encodec_24k_24bps.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
.../funcodec_en_libritts_16k_gr1nq32ds320.py → .../funcodec_en_libritts_16k_gr1nq32ds320.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
.../funcodec_en_libritts_16k_gr8nq32ds320.py → .../funcodec_en_libritts_16k_gr8nq32ds320.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/funcodec_en_libritts_16k_nq32ds320.py → ...dec/funcodec_en_libritts_16k_nq32ds320.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/funcodec_en_libritts_16k_nq32ds640.py → ...dec/funcodec_en_libritts_16k_nq32ds640.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/funcodec_zh_en_16k_nq32ds320.py → ...dec/codec/funcodec_zh_en_16k_nq32ds320.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/funcodec_zh_en_16k_nq32ds640.py → ...dec/codec/funcodec_zh_en_16k_nq32ds640.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
codec/speech_tokenizer_16k.py → AudCodec/codec/speech_tokenizer_16k.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
import dataset.general | ||
def load_dataset(dataset_name): | ||
module = __import__(f"dataset.{dataset_name}", fromlist=[dataset_name]) | ||
return module.load_data() |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Contributing to Codec-SUPERB | ||
|
||
We welcome contributions to Codec-SUPERB in several areas: models, datasets, and metrics. Here's how you can contribute: | ||
|
||
## Contributing Models | ||
|
||
1. Fork the Codec-SUPERB repository. | ||
2. Add your model to the `models` directory. Please ensure your model adheres to the interface defined in `models/README.md`. | ||
3. Add tests for your model in the `tests` directory. | ||
4. Submit a pull request with your changes. Please include a detailed description of your model and how it improves Codec-SUPERB. | ||
|
||
## Contributing Datasets | ||
|
||
1. Fork the Codec-SUPERB repository. | ||
2. Add your dataset to the `datasets` directory. Please ensure your dataset adheres to the format defined in `datasets/README.md`. | ||
3. Add tests for your dataset in the `tests` directory. | ||
4. Submit a pull request with your changes. Please include a detailed description of your dataset and how it improves Codec-SUPERB. | ||
|
||
## Contributing Metrics | ||
|
||
1. Fork the Codec-SUPERB repository. | ||
2. Add your metric to the `metrics` directory. Please ensure your metric adheres to the interface defined in `metrics/README.md`. | ||
3. Add tests for your metric in the `tests` directory. | ||
4. Submit a pull request with your changes. Please include a detailed description of your metric and how it improves Codec-SUPERB. | ||
|
||
We look forward to your contributions! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,36 +1,78 @@ | ||
# Audio Codec Benchmark | ||
# Codec-SUPERB: Audio Codec Speech Processing Universal Performance Benchmark | ||
|
||
## Codec Collection: | ||
data:image/s3,"s3://crabby-images/a2ab9/a2ab9dd69bb3389a02ec6828c5ea5edf2abfb35f" alt="Overview" | ||
|
||
- https://github.com/ZhangXInFD/SpeechTokenizer | ||
- https://github.com/descriptinc/descript-audio-codec | ||
- https://github.com/facebookresearch/encodec | ||
- https://github.com/yangdongchao/AcademiCodec | ||
- https://github.com/facebookresearch/AudioDec | ||
- https://github.com/alibaba-damo-academy/FunCodec | ||
Codec-SUPERB is a comprehensive benchmark designed to evaluate audio codec models across a variety of speech tasks. Our | ||
goal is to facilitate community collaboration and accelerate advancements in the field of speech processing by | ||
preserving and enhancing speech information quality. | ||
|
||
|
||
## Table of Contents | ||
|
||
- [Introduction](#introduction) | ||
- [Key Features](#key-features) | ||
- [Installation](#installation) | ||
- [Usage](#usage) | ||
- [Benchmarking](#benchmarking) | ||
- [Contribution](#contribution) | ||
- [License](#license) | ||
|
||
## Introduction | ||
|
||
Codec-SUPERB sets a new benchmark in evaluating audio codec models, providing a rigorous and transparent framework for | ||
assessing performance across a range of speech processing tasks. Our goal is to foster innovation and set new standards | ||
in audio quality and processing efficiency. | ||
|
||
## Key Features | ||
|
||
## Criteria | ||
### Out-of-the-Box Codec Interface | ||
Codec-SUPERB offers an intuitive, out-of-the-box codec interface that allows for easy integration and testing of various | ||
codec models, facilitating quick iterations and experiments. | ||
|
||
### Waveform (Lower is better) | ||
### Multi-Perspective Leaderboard | ||
Codec-SUPERB's unique blend of multi-perspective evaluation and an online leaderboard drives innovation in audio codec research by providing a comprehensive assessment and fostering competitive transparency among developers. | ||
|
||
L1Loss in waveform | ||
### Standardized Environment | ||
We ensure a standardized testing environment to guarantee fair and consistent comparison across all models. This | ||
uniformity brings reliability to benchmark results, making them universally interpretable. | ||
|
||
### Mel Distance (Lower is better) | ||
### Unified Datasets | ||
We provide a collection of unified datasets, curated to test a wide range of speech processing scenarios. This ensures | ||
that models are evaluated under diverse conditions, reflecting real-world applications. | ||
|
||
The Mel Distance is the distance between the log mel spectrograms of the reconstructed and ground truth waveforms. | ||
## Installation | ||
|
||
### STFT Distance (Lower is better) | ||
```bash | ||
git clone https://github.com/voidful/Codec-SUPERB.git | ||
cd Codec-SUPERB | ||
pip install -r requirements.txt | ||
``` | ||
|
||
This metric calculates the distance between the log magnitude spectrograms of the reconstructed and ground truth | ||
waveforms, using window lengths of [2048, 512], and is better at capturing fidelity in higher frequencies compared to | ||
the Mel Distance. | ||
## Usage | ||
|
||
### PESQ (Higher is better) | ||
Detailed instructions on how to use Codec-SUPERB, including preparing your codec model and executing benchmark tests, | ||
can be found in the `docs` directory. | ||
|
||
PESQ is an intrusive perceptual quality metric for automated assessment of the speech quality. We adopt ITU-T P.862.2 (wideband). | ||
## Benchmarking | ||
|
||
### STOI (Higher is better) | ||
Codec-SUPERB supports a comprehensive suite of speech tasks, from speech recognition to audio quality assessment, each | ||
designed to rigorously evaluate the capabilities of audio codec models. | ||
|
||
STOI is an intrusive perceptual quality metric that assesses audio quality based on the intelligibility of the | ||
reconstructed speech. | ||
## Contribution | ||
|
||
Contributions are highly encouraged, whether it's through adding new codec models, expanding the dataset collection, or | ||
enhancing the benchmarking framework. Please see `CONTRIBUTING.md` for more details. | ||
|
||
## License | ||
|
||
This project is licensed under the MIT License - see the `LICENSE` file for details. | ||
|
||
|
||
## Reference Audio Codec Repositories: | ||
|
||
- https://github.com/ZhangXInFD/SpeechTokenizer | ||
- https://github.com/descriptinc/descript-audio-codec | ||
- https://github.com/facebookresearch/encodec | ||
- https://github.com/yangdongchao/AcademiCodec | ||
- https://github.com/facebookresearch/AudioDec | ||
- https://github.com/alibaba-damo-academy/FunCodec |
Empty file.
Empty file.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.