You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I managed to get everything running on Ubuntu running in WSL on Windows 10. The VM is using Python 3.9.13. However...
Is it just me, or is the website demo setup with different models or weights?
On the website, it is more accurate to the voice than my local copy; I'm pretty sure I followed USAGE.md to the letter...
Even with the reference voices provided (in the repository), both v1 and v2 are less accurate than the website when run on my machine.
In fact, in some cases, the tone is completely off (for example, my voice). The Annie one is very subtle, but it's there.
Hey there, @aytimothy! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's tackle this issue together!
I found a similar open issue titled "尝试了中文男性,感觉有浓重的日本口音" which discusses voice accuracy concerns. However, it does not have any comments or solutions provided yet [1].
For your specific issue, differences in model performance between local and web demos can sometimes be due to variations in model versions, configurations, or even the environment setup. It might be helpful to verify if the same model weights and configurations are being used locally as on the website. Additionally, checking for any updates or differences in the USAGE.md documentation might provide further insights.
I managed to get everything running on Ubuntu running in WSL on Windows 10. The VM is using Python 3.9.13. However...
Is it just me, or is the website demo setup with different models or weights?
On the website, it is more accurate to the voice than my local copy; I'm pretty sure I followed USAGE.md to the letter...
Even with the reference voices provided (in the repository), both v1 and v2 are less accurate than the website when run on my machine.
In fact, in some cases, the tone is completely off (for example, my voice). The Annie one is very subtle, but it's there.
Note: My voice's reference may be a little dirty, but I used a cleaner and shorter (about 20 seconds) clip and the results were the same.
Is this expected?
Related: #339
The text was updated successfully, but these errors were encountered: