diff --git a/README_CN.md b/README_CN.md index 8bc42e2..704660e 100644 --- a/README_CN.md +++ b/README_CN.md @@ -54,14 +54,14 @@ Qwen2-Audio 三阶段训练过程概述。 (注意:我们所展示的评测结果是在原始训练框架的初始模型上的,然而在框架转换 Huggingface 后指标出现了部分波动,在这里我们展示我们的全部测评结果:首先是论文中的初始模型结果) - +
TaskDatasetModelPerformance
MetricsResults
ASRLibrispeech
dev-clean | dev-other |
test-clean | test-other
SpeechT5WER 2.1 | 5.5 | 2.4 | 5.8
SpeechNet- | - | 30.7 | -
SLM-FT- | - | 2.6 | 5.0
SALMONN- | - | 2.1 | 4.9
SpeechVerse- | - | 2.1 | 4.4
Qwen-Audio1.8 | 4.0 | 2.0 | 4.2
Qwen2-Audio1.3 | 3.4 | 1.6 | 3.6
Common Voice 15
en | zh | yue | fr
Whisper-large-v3WER 9.3 | 12.8 | 10.9 | 10.8
Qwen2-Audio8.6 | 6.9 | 5.9 | 9.6
Fleurs
zh
Whisper-large-v3WER 7.7
Qwen2-Audio7.5
Aishell2
Mic | iOS | Android
MMSpeech-baseWER 4.5 | 3.9 | 4.0
Paraformer-large- | 2.9 | -
Qwen-Audio3.3 | 3.1 | 3.3
Qwen2-Audio3.0 | 3.0 | 2.9
S2TTCoVoST2
en-de | de-en |
en-zh | zh-en
SALMONNBLEU 18.6 | - | 33.1 | -
SpeechLLaMA- | 27.1 | - | 12.3
BLSP14.1 | - | - | -
Qwen-Audio25.1 | 33.9 | 41.5 | 15.7
Qwen2-Audio29.9 | 35.2 | 45.2 | 24.4
Fleurs
zh
Whisper-large-v3WER 7.7
Qwen2-Audio7.5
Aishell2
Mac | iOS | Android
MMSpeech-baseWER 4.5 | 3.9 | 4.0
Paraformer-large- | 2.9 | -
Qwen-Audio3.3 | 3.1 | 3.3
Qwen2-Audio3.0 | 3.0 | 2.9
S2TTCoVoST2
en-de | de-en |
en-zh | zh-en
SALMONNBLEU 18.6 | - | 33.1 | -
SpeechLLaMA- | 27.1 | - | 12.3
BLSP14.1 | - | - | -
Qwen-Audio25.1 | 33.9 | 41.5 | 15.7
Qwen2-Audio29.9 | 35.2 | 45.2 | 24.4
CoVoST2
es-en | fr-en | it-en |
SpeechLLaMABLEU 27.9 | 25.2 | 25.9
Qwen-Audio39.7 | 38.5 | 36.0
Qwen2-Audio40.0 | 38.5 | 36.3
SERMeldWavLM-largeACC 0.542
Qwen-Audio0.557
Qwen2-Audio0.553
VSCVocalSoundCLAPACC 0.4945
Pengi0.6035
Qwen-Audio0.9289
Qwen2-Audio0.9392
AIR-Bench
Chat Benchmark
Speech | Sound |
Music | Mixed-Audio
SALMONN
BLSP
Pandagpt
Macaw-LLM
SpeechGPT
Next-gpt
Qwen-Audio
Gemini-1.5-pro
Qwen2-Audio
GPT-4 6.16 | 6.28 | 5.95 | 6.08
6.17 | 5.55 | 5.08 | 5.33
3.58 | 5.46 | 5.06 | 4.25
0.97 | 1.01 | 0.91 | 1.01
1.57 | 0.95 | 0.95 | 4.13
3.86 | 4.76 | 4.18 | 4.13
6.47 | 6.95 | 5.52 | 6.08
6.97 | 5.49 | 5.06 | 5.27
7.18 | 6.99 | 6.79 | 6.77
(其次是转换 huggingface 后的) - +
TaskDatasetModelPerformance
MetricsResults
ASRLibrispeech
dev-clean | dev-other |
test-clean | test-other
SpeechT5WER 2.1 | 5.5 | 2.4 | 5.8
SpeechNet- | - | 30.7 | -
SLM-FT- | - | 2.6 | 5.0
SALMONN- | - | 2.1 | 4.9
SpeechVerse- | - | 2.1 | 4.4
Qwen-Audio1.8 | 4.0 | 2.0 | 4.2
Qwen2-Audio1.7 | 3.6 | 1.7 | 4.0
Common Voice 15
en | zh | yue | fr
Whisper-large-v3WER 9.3 | 12.8 | 10.9 | 10.8
Qwen2-Audio8.7 | 6.5 | 5.9 | 9.6
Fleurs
zh
Whisper-large-v3WER 7.7
Qwen2-Audio7.0
Aishell2
Mic | iOS | Android
MMSpeech-baseWER 4.5 | 3.9 | 4.0
Paraformer-large- | 2.9 | -
Qwen-Audio3.3 | 3.1 | 3.3
Qwen2-Audio3.2 | 3.1 | 2.9
S2TTCoVoST2
en-de | de-en |
en-zh | zh-en
SALMONNBLEU 18.6 | - | 33.1 | -
SpeechLLaMA- | 27.1 | - | 12.3
BLSP14.1 | - | - | -
Qwen-Audio25.1 | 33.9 | 41.5 | 15.7
Qwen2-Audio29.6 | 33.6 | 45.6 | 24.0
Fleurs
zh
Whisper-large-v3WER 7.7
Qwen2-Audio7.0
Aishell2
Mac | iOS | Android
MMSpeech-baseWER 4.5 | 3.9 | 4.0
Paraformer-large- | 2.9 | -
Qwen-Audio3.3 | 3.1 | 3.3
Qwen2-Audio3.2 | 3.1 | 2.9
S2TTCoVoST2
en-de | de-en |
en-zh | zh-en
SALMONNBLEU 18.6 | - | 33.1 | -
SpeechLLaMA- | 27.1 | - | 12.3
BLSP14.1 | - | - | -
Qwen-Audio25.1 | 33.9 | 41.5 | 15.7
Qwen2-Audio29.6 | 33.6 | 45.6 | 24.0
CoVoST2
es-en | fr-en | it-en |
SpeechLLaMABLEU 27.9 | 25.2 | 25.9
Qwen-Audio39.7 | 38.5 | 36.0
Qwen2-Audio38.7 | 37.2 | 35.2
SERMeldWavLM-largeACC 0.542
Qwen-Audio0.557
Qwen2-Audio0.535
VSCVocalSoundCLAPACC 0.4945
Pengi0.6035
Qwen-Audio0.9289
Qwen2-Audio0.9395
AIR-Bench
Chat Benchmark
Speech | Sound |
Music | Mixed-Audio
SALMONN
BLSP
Pandagpt
Macaw-LLM
SpeechGPT
Next-gpt
Qwen-Audio
Gemini-1.5-pro
Qwen2-Audio
GPT-4 6.16 | 6.28 | 5.95 | 6.08
6.17 | 5.55 | 5.08 | 5.33
3.58 | 5.46 | 5.06 | 4.25
0.97 | 1.01 | 0.91 | 1.01
1.57 | 0.95 | 0.95 | 4.13
3.86 | 4.76 | 4.18 | 4.13
6.47 | 6.95 | 5.52 | 6.08
6.97 | 5.49 | 5.06 | 5.27
7.24 | 6.83 | 6.73 | 6.42