Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech SFT #752

Open
zidsi opened this issue Jan 18, 2025 · 2 comments
Open

Speech SFT #752

zidsi opened this issue Jan 18, 2025 · 2 comments
Assignees

Comments

@zidsi
Copy link

zidsi commented Jan 18, 2025

Is speech SFT feasable for new domain (e.g. new language) so all "linked" parts (Whisper, LLM and ChatTTS) would learn in e2e fashion? Or should one first try some sort of CPT for individual parts to improve new domain?

@Cuiunbo
Copy link
Collaborator

Cuiunbo commented Jan 18, 2025

Hello! Generally speaking, for any field where each module has had no prior exposure, such as Spanish Speech Q&A, it is necessary to first ensure that each module possesses the corresponding foundational capabilities: Whisper should have the ability to extract Spanish features, LLM should have the ability to understand Spanish and reply in Spanish, and ChatTTS should have the ability to read Spanish. For training these capabilities, end-to-end training is the most efficient optimization strategy and should yield the best results. There’s no need for stage-wise training, as it will significantly reduce your data utilization efficiency. Moreover, training new foundational capabilities requires a lot of data. If you are constrained by GPU resources, I recommend applying LoRA to the MiniCPM-o 2.6's LLM while enabling both Whisper and ChatTTS tuned.

@zidsi
Copy link
Author

zidsi commented Jan 19, 2025

Generally speaking I agree. Can you provide training example for such SFT. ChatTTS will need to adapt most, since Whisper and Qwen have seen decent multililingual PT data and images are multilingual ;)

Or maybe TTS part is not trained at all?

Might just wait for tech report. Looking forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants