Add Speed & Enable HuggingFace model loading #27

Nik-Kras · 2024-10-07T14:15:08Z

There are many issues opened on a missing speed parameter, like this one #6 and yl4579/StyleTTS#3

The implementation is rather simple and many dubbing projects will benefit from this minor update.

--

Also, I found many checkpoints on HuggingFace, like this one: https://huggingface.co/ShoukanLabs/Vokan/resolve/main/Model/epoch_2nd_00012.pth

And now you can pass the URL to load any suitable model.

Nik-Kras · 2024-10-07T14:17:10Z

cc @rsxdalv @sidharthrajaram

rsxdalv · 2024-10-07T18:39:05Z

The idea looks good, if I'll have the time to test it out soon, I will give it a go, thanks for tagging me.

…

On Mon, Oct 7, 2024, 5:17 PM Nikita Krasnytskyi ***@***.***> wrote: cc @rsxdalv <https://github.com/rsxdalv> @sidharthrajaram <https://github.com/sidharthrajaram> — Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTRXI632JWRX4HM6DQS5ODZ2KJX3AVCNFSM6AAAAABPQBJ5KOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJXGA3DEOBVGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Nik-Kras · 2024-10-08T09:33:23Z

time to test it out soon

I use this updated version in my personal project, where I generate speech that is aligned with the referenced audio. You can use similar code for testing:

def generate_aligned_speech(self, text: str, target_voice_path: str, output_wav_file: str):
        """ Generates a speech with intonation and voice of the target voice, saying given text with duration not exceeding the original target audio """
        wave, sr = librosa.load(target_voice_path)
        original_duration = len(wave) / sr
        out = self.my_tts.inference(
            text=text,
            target_voice_path=target_voice_path,
            output_wav_file=output_wav_file,
            speed=1
        )
        generated_duration = len(out) / 24_000
        
        out = self.my_tts.inference(
            text=text,
            target_voice_path=target_voice_path,
            output_wav_file=output_wav_file,
            speed=generated_duration/original_duration
        )

Full project: https://github.com/Nik-Kras/voice_ukr_to_eng/tree/main

Nik-Kras added 3 commits October 7, 2024 14:02

added speed param

6c1f74d

updated model loading

eb163cd

updated doc string for model loading

157956b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Speed & Enable HuggingFace model loading #27

Add Speed & Enable HuggingFace model loading #27

Nik-Kras commented Oct 7, 2024

Nik-Kras commented Oct 7, 2024

rsxdalv commented Oct 7, 2024 via email

Nik-Kras commented Oct 8, 2024 •

edited

Loading

Add Speed & Enable HuggingFace model loading #27

Are you sure you want to change the base?

Add Speed & Enable HuggingFace model loading #27

Conversation

Nik-Kras commented Oct 7, 2024

Nik-Kras commented Oct 7, 2024

rsxdalv commented Oct 7, 2024 via email

Nik-Kras commented Oct 8, 2024 • edited Loading

Nik-Kras commented Oct 8, 2024 •

edited

Loading