Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for new languages #111

Open
pyai88 opened this issue Jan 11, 2025 · 1 comment
Open

Support for new languages #111

pyai88 opened this issue Jan 11, 2025 · 1 comment

Comments

@pyai88
Copy link

pyai88 commented Jan 11, 2025

Hello, thank you for the great work. I have a couple of questions:

  1. Should this method work on a new language out-of-the-box? I'm seeing properties from the training language in my output, so I'm wondering if I've made a mistake.
  2. If it doesn't work out-of-the-box, would fine-tuning the pre-trained model be preferable to training from scratch?
  3. To help me estimate the training cost, could you provide guidance on how much data is typically needed for a new language and how long you trained your pre-trained model for?

Thank you in advance.

@Plachtaa
Copy link
Owner

Hi there,
If you find the output in unseen language being accented, you may try finetuning the current checkpoint with the language you desire.
I cannot give an estimation how many hours of data is required, the only thing I suggest is to use as much as you have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants