-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison between base and CTC models #10
Comments
test.zip |
I am also bothered by this. Here is what I find: When not using posthoc bottleneck configurations, the model actually calls the FSQ implemented in After checking the source code, I found that the default setting for FSQ (without calling posthoc bottleneck) is self.preset_bottleneck_configs = {
...
"1x24137569_625bps":[
([17,17,17,17,17,17], 1.0)
],
}
... And use However, by testing on some utterances, I found that calling |
Yes, this is a known upstream bug in Is there a practical reason to use anything other than the current available posthoc bottlenecks? 700bps and 1000bps residual versions both have the same accuracy as the original 17 level version, whilst having a manageable codebook size. If someone has a practical usecase for |
Objective metrics are available at the bottom of the README. The CTC version performs slightly worse according to those metrics, but I 100% agree that perceptually it's better. |
Hi, I tried both the CTC model and the base model, and it seems that the bug I encountered previously hasn't been fixed yet. The post hoc results are still somewhat better than the original FSQ.
Additionally, the CTC loss model seems to produce clearer pronunciation and overall better performance. Have any specific evaluations or comparisons been done in this part?
The text was updated successfully, but these errors were encountered: