diff --git a/README_CN.md b/README_CN.md index 8bc42e2..704660e 100644 --- a/README_CN.md +++ b/README_CN.md @@ -54,14 +54,14 @@ Qwen2-Audio 三阶段训练过程概述。 (注意:我们所展示的评测结果是在原始训练框架的初始模型上的,然而在框架转换 Huggingface 后指标出现了部分波动,在这里我们展示我们的全部测评结果:首先是论文中的初始模型结果)
Task | Dataset | Model | Performance | |
---|---|---|---|---|
Metrics | Results | |||
ASR | Librispeech dev-clean | dev-other | test-clean | test-other | SpeechT5 | WER | 2.1 | 5.5 | 2.4 | 5.8 |
SpeechNet | - | - | 30.7 | - | |||
SLM-FT | - | - | 2.6 | 5.0 | |||
SALMONN | - | - | 2.1 | 4.9 | |||
SpeechVerse | - | - | 2.1 | 4.4 | |||
Qwen-Audio | 1.8 | 4.0 | 2.0 | 4.2 | |||
Qwen2-Audio | 1.3 | 3.4 | 1.6 | 3.6 | |||
Common Voice 15 en | zh | yue | fr | Whisper-large-v3 | WER | 9.3 | 12.8 | 10.9 | 10.8 | |
Qwen2-Audio | 8.6 | 6.9 | 5.9 | 9.6 | |||
Fleurs zh | Whisper-large-v3 | WER | 7.7 | |
Qwen2-Audio | 7.5 | |||
Aishell2 Mic | iOS | Android | MMSpeech-base | WER | 4.5 | 3.9 | 4.0 | |
Paraformer-large | - | 2.9 | - | |||
Qwen-Audio | 3.3 | 3.1 | 3.3 | |||
Qwen2-Audio | 3.0 | 3.0 | 2.9 | |||
S2TT | CoVoST2 en-de | de-en | en-zh | zh-en | SALMONN | BLEU | 18.6 | - | 33.1 | - |
SpeechLLaMA | - | 27.1 | - | 12.3 | |||
BLSP | 14.1 | - | - | - | |||
Qwen-Audio | 25.1 | 33.9 | 41.5 | 15.7 | |||
Qwen2-Audio | 29.9 | 35.2 | 45.2 | 24.4 | |||
Fleurs zh | Whisper-large-v3 | WER | 7.7 | |
Qwen2-Audio | 7.5 | |||
Aishell2 Mac | iOS | Android | MMSpeech-base | WER | 4.5 | 3.9 | 4.0 | |
Paraformer-large | - | 2.9 | - | |||
Qwen-Audio | 3.3 | 3.1 | 3.3 | |||
Qwen2-Audio | 3.0 | 3.0 | 2.9 | |||
S2TT | CoVoST2 en-de | de-en | en-zh | zh-en | SALMONN | BLEU | 18.6 | - | 33.1 | - |
SpeechLLaMA | - | 27.1 | - | 12.3 | |||
BLSP | 14.1 | - | - | - | |||
Qwen-Audio | 25.1 | 33.9 | 41.5 | 15.7 | |||
Qwen2-Audio | 29.9 | 35.2 | 45.2 | 24.4 | |||
CoVoST2 es-en | fr-en | it-en | | SpeechLLaMA | BLEU | 27.9 | 25.2 | 25.9 | |
Qwen-Audio | 39.7 | 38.5 | 36.0 | |||
Qwen2-Audio | 40.0 | 38.5 | 36.3 | |||
SER | Meld | WavLM-large | ACC | 0.542 |
Qwen-Audio | 0.557 | |||
Qwen2-Audio | 0.553 | |||
VSC | VocalSound | CLAP | ACC | 0.4945 |
Pengi | 0.6035 | |||
Qwen-Audio | 0.9289 | |||
Qwen2-Audio | 0.9392 | |||
AIR-Bench | Chat Benchmark Speech | Sound | Music | Mixed-Audio | SALMONN BLSP Pandagpt Macaw-LLM SpeechGPT Next-gpt Qwen-Audio Gemini-1.5-pro Qwen2-Audio | GPT-4 | 6.16 | 6.28 | 5.95 | 6.08 6.17 | 5.55 | 5.08 | 5.33 3.58 | 5.46 | 5.06 | 4.25 0.97 | 1.01 | 0.91 | 1.01 1.57 | 0.95 | 0.95 | 4.13 3.86 | 4.76 | 4.18 | 4.13 6.47 | 6.95 | 5.52 | 6.08 6.97 | 5.49 | 5.06 | 5.27 7.18 | 6.99 | 6.79 | 6.77 |
Task | Dataset | Model | Performance | |
---|---|---|---|---|
Metrics | Results | |||
ASR | Librispeech dev-clean | dev-other | test-clean | test-other | SpeechT5 | WER | 2.1 | 5.5 | 2.4 | 5.8 |
SpeechNet | - | - | 30.7 | - | |||
SLM-FT | - | - | 2.6 | 5.0 | |||
SALMONN | - | - | 2.1 | 4.9 | |||
SpeechVerse | - | - | 2.1 | 4.4 | |||
Qwen-Audio | 1.8 | 4.0 | 2.0 | 4.2 | |||
Qwen2-Audio | 1.7 | 3.6 | 1.7 | 4.0 | |||
Common Voice 15 en | zh | yue | fr | Whisper-large-v3 | WER | 9.3 | 12.8 | 10.9 | 10.8 | |
Qwen2-Audio | 8.7 | 6.5 | 5.9 | 9.6 | |||
Fleurs zh | Whisper-large-v3 | WER | 7.7 | |
Qwen2-Audio | 7.0 | |||
Aishell2 Mic | iOS | Android | MMSpeech-base | WER | 4.5 | 3.9 | 4.0 | |
Paraformer-large | - | 2.9 | - | |||
Qwen-Audio | 3.3 | 3.1 | 3.3 | |||
Qwen2-Audio | 3.2 | 3.1 | 2.9 | |||
S2TT | CoVoST2 en-de | de-en | en-zh | zh-en | SALMONN | BLEU | 18.6 | - | 33.1 | - |
SpeechLLaMA | - | 27.1 | - | 12.3 | |||
BLSP | 14.1 | - | - | - | |||
Qwen-Audio | 25.1 | 33.9 | 41.5 | 15.7 | |||
Qwen2-Audio | 29.6 | 33.6 | 45.6 | 24.0 | |||
Fleurs zh | Whisper-large-v3 | WER | 7.7 | |
Qwen2-Audio | 7.0 | |||
Aishell2 Mac | iOS | Android | MMSpeech-base | WER | 4.5 | 3.9 | 4.0 | |
Paraformer-large | - | 2.9 | - | |||
Qwen-Audio | 3.3 | 3.1 | 3.3 | |||
Qwen2-Audio | 3.2 | 3.1 | 2.9 | |||
S2TT | CoVoST2 en-de | de-en | en-zh | zh-en | SALMONN | BLEU | 18.6 | - | 33.1 | - |
SpeechLLaMA | - | 27.1 | - | 12.3 | |||
BLSP | 14.1 | - | - | - | |||
Qwen-Audio | 25.1 | 33.9 | 41.5 | 15.7 | |||
Qwen2-Audio | 29.6 | 33.6 | 45.6 | 24.0 | |||
CoVoST2 es-en | fr-en | it-en | | SpeechLLaMA | BLEU | 27.9 | 25.2 | 25.9 | |
Qwen-Audio | 39.7 | 38.5 | 36.0 | |||
Qwen2-Audio | 38.7 | 37.2 | 35.2 | |||
SER | Meld | WavLM-large | ACC | 0.542 |
Qwen-Audio | 0.557 | |||
Qwen2-Audio | 0.535 | |||
VSC | VocalSound | CLAP | ACC | 0.4945 |
Pengi | 0.6035 | |||
Qwen-Audio | 0.9289 | |||
Qwen2-Audio | 0.9395 | |||
AIR-Bench | Chat Benchmark Speech | Sound | Music | Mixed-Audio | SALMONN BLSP Pandagpt Macaw-LLM SpeechGPT Next-gpt Qwen-Audio Gemini-1.5-pro Qwen2-Audio | GPT-4 | 6.16 | 6.28 | 5.95 | 6.08 6.17 | 5.55 | 5.08 | 5.33 3.58 | 5.46 | 5.06 | 4.25 0.97 | 1.01 | 0.91 | 1.01 1.57 | 0.95 | 0.95 | 4.13 3.86 | 4.76 | 4.18 | 4.13 6.47 | 6.95 | 5.52 | 6.08 6.97 | 5.49 | 5.06 | 5.27 7.24 | 6.83 | 6.73 | 6.42 |