Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Speed & Enable HuggingFace model loading #27

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Nik-Kras
Copy link

@Nik-Kras Nik-Kras commented Oct 7, 2024

There are many issues opened on a missing speed parameter, like this one #6 and yl4579/StyleTTS#3

The implementation is rather simple and many dubbing projects will benefit from this minor update.

--

Also, I found many checkpoints on HuggingFace, like this one: https://huggingface.co/ShoukanLabs/Vokan/resolve/main/Model/epoch_2nd_00012.pth

And now you can pass the URL to load any suitable model.

@Nik-Kras
Copy link
Author

Nik-Kras commented Oct 7, 2024

cc @rsxdalv @sidharthrajaram

@rsxdalv
Copy link
Contributor

rsxdalv commented Oct 7, 2024 via email

@Nik-Kras
Copy link
Author

Nik-Kras commented Oct 8, 2024

time to test it out soon

I use this updated version in my personal project, where I generate speech that is aligned with the referenced audio. You can use similar code for testing:

def generate_aligned_speech(self, text: str, target_voice_path: str, output_wav_file: str):
        """ Generates a speech with intonation and voice of the target voice, saying given text with duration not exceeding the original target audio """
        wave, sr = librosa.load(target_voice_path)
        original_duration = len(wave) / sr
        out = self.my_tts.inference(
            text=text,
            target_voice_path=target_voice_path,
            output_wav_file=output_wav_file,
            speed=1
        )
        generated_duration = len(out) / 24_000
        
        out = self.my_tts.inference(
            text=text,
            target_voice_path=target_voice_path,
            output_wav_file=output_wav_file,
            speed=generated_duration/original_duration
        )

Full project: https://github.com/Nik-Kras/voice_ukr_to_eng/tree/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants