-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Facodec and training #7
Comments
I tried to do FACodec training, but haven't succeeded. In general i am not satisfied with FACodec performance. |
thanks for your answer |
I also discourage by facodec performance. |
I would like to ask if you can describe in detail how you feel about the effect after testing it. |
If this is question for me, my main problem is that in is frame-based and doing prediction of this tokens would be challenging because how much of them you needed. Also my tests didn't show what promised in papers - codes are not disentangled - you need residual codes for nice voice, content codes are dependent on speaker identity still, etc
Steve Korshakov
Sent via ***@***.***>
On Mon, Apr 01, 2024 at 12:43 AM, yiwei0730 ***@***.******@***.***>> wrote:
I would like to ask if you can describe in detail how you feel about the effect after testing it.
I think he has emphasized that the key to NS3 is the FACODEC, but the amount of parameters added and used in NS3 is indeed large enough.
—
Reply to this email directly, view it on GitHub<#7 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AADB2E3MP2EG4HUDMH3WKQ3Y3EFYZAVCNFSM6AAAAABFL7XNFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRZGMZDKOBVGA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@yiwei0730 I have the same conclusion as @ex3ndr , Gradient Reversal is not a 100 % working mechanism and FACodec mostly relied on that which results in information leakage between the codes That is why codes are not properly disentangled. It might work with NS3 architecture because all codes ultimately combine before feeding to the decoder, but if you plan to use the codec separately it won't work properly and result in poor quality. |
Well I don’t think the results are good enough to slam diffusion on top of it too: the problem of voice box is that it is too versatile and you need to control it, but this tokens probably are too tied to specific speech styles to be useful.
Steve Korshakov
Sent via Superhuman ***@***.***>
On Mon, Apr 1 2024 at 10:20 PM, Rishikesh ***@***.******@***.***>> wrote:
@yiwei0730<https://github.com/yiwei0730> I have the same conclusion as @ex3ndr<https://github.com/ex3ndr> , Gradient Reversal is not a 100 % working mechanism and FACodec mostly relied on that which results in information leakage between the codes That is why codes are not properly disentangled. It might work with NS3 architecture because all codes ultimately combine before feeding to the decoder, but if you plan to use the codec separately it won't work properly and result in poor quality.
—
Reply to this email directly, view it on GitHub<#7 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AADB2EZCABWKJPW4GXPX5PDY3I5YPAVCNFSM6AAAAABFL7XNFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZRGEYDCOBWG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I saw that there is another library called super-gpt-facodec.
Is there any chance I can connect them in series?
And I want to ask about training issues
Should super-gpt-facodec and supervoice be trained separately?
Is there a step-by-step guide I can borrow?
The text was updated successfully, but these errors were encountered: