You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Also relevant is: https://www.arxiv.org/pdf/2407.12707 On “TTS Arena” UTMOSv1 has only a weak correlation with the leaderboard, while WVMOS has much better results. I haven't tested UTMOSv2 and WVMOS yet. But UTMOSv1 does not necessarily lead to a correct voice quality evaluation in my experiments.
The text was updated successfully, but these errors were encountered:
I have already used these model in my work. My work is clean the different language audio data. The WV-MOS will predict the score lke -0.1, 0.123...。i listen these audios, many of them are noise or over-noise reduction。if i use 0.1 as the interval, such as 2.0, 2.1, 2.2。the differences between them is hard to seperate. if you choose 2.0, 3.0 to listen, you can get a better distinguish results. Besides,in the multilingual audio clips, some language always get a lower score comparing to the american english, like chinese, Cantonese,even British English。
However, The UTMOSv1 model seems has some different with the WV-MOS。it seems to seperate the score more distinguishable. Of coure,it also have the different score in the different language. Besides, I haven't checked if the UTMOSv1 model will output negative numbers.
Add WV-MOS from https://arxiv.org/pdf/2203.13086 Code is here: https://github.com/AndreevP/wvmos/tree/main
Also relevant is: https://www.arxiv.org/pdf/2407.12707 On “TTS Arena” UTMOSv1 has only a weak correlation with the leaderboard, while WVMOS has much better results. I haven't tested UTMOSv2 and WVMOS yet. But UTMOSv1 does not necessarily lead to a correct voice quality evaluation in my experiments.
The text was updated successfully, but these errors were encountered: