Fix and enable XTTS streaming #478

SilyNoMeta · 2025-01-04T18:07:48Z

adds the ability to enable streaming on XTTS settings (disabled on others depending of capabilities)
uses the state of the streaming flag when using the Open AI compatible Speech API
fix the streaming mode

* Tested with @RASPAUDIO french model available here : https://huggingface.co/RASPIAUDIO/F5-French-MixedSpeakers-reduced

* adds langdetect as requirement for colab, standalone and textgen * adds "auto" to the language dropdown in the Advanced Engine/Model Settings panel * replace the hardcoded "en" by "auto" when called by the OpenAI compatible Speech API

Add initial support for pickletensor models to F5-TTS

Add language auto-detection

erew123 · 2025-01-07T09:28:09Z

Hi @SilyNoMeta

As you may note there is a github merge/sequencing thing going on here with the next 4 PR's that you sent, all seemingly back to the tts_server.py. Should be easy enough to sort out, but I am looking deeper at the code changes before I pull things in. I push them all to a staging area first and then up to the alltalkbeta.

That aside, I have 2x questions for you on this update:

Any reason you have set the central generate function to specify "none" for the output file name? It doesnt matter either way specifically, however the original code should do exactly the same behaviour as the changes you made, just in less lines (though it will add a file name, if one existed, but that wont matter for streaming), its just more compact. I was wondering if there was a specific issue you encountered?

alltalk_tts/tts_server.py

Line 951 in c83faf9

    
           text, voice, language, temperature, repetition_penalty, speed, pitch, output_file=None, streaming=True

I see you are re-defining/setting up a new variable for the model_engine class as current_model_engine

alltalk_tts/tts_server.py

Lines 1129 to 1130 in c83faf9

    
           # Load current model engine configuration 
        
           current_model_engine = tts_class()

but its already pulled in as model_engine:

alltalk_tts/tts_server.py

Line 193 in c83faf9

model_engine = tts_class()

is this because you are attempting to re-load the variables from the actual underlying engine on each run, in case the mapped voice changed? If so, Im probably going to move this back to an update of model_engine just to keep all variables the same throughout the script. But Im just checking thats what I think you are doing, or if there was some other reason/issue you encountered?

Sorry to have to ask, but I do like to ensure I know why the code is doing certain things and I actually have a huge update 80% done that I will have to merge in after all these new PR's and there are quite a few changes to do with generation of TTS and a new rvc pipeline, so I just need to be certain on the core functionality of the generate functions in my head.

Thanks

SilyNoMeta · 2025-01-07T13:55:08Z

Hey !
I do not have a lot of time currently but I'll do my best explaining why I did these changes a few days ago.

Most of the changes in tts_server.py and model_engine.py are as you said, not necessary so you shouldn't really bother merging those if it will conflicts with your working branches.

Actually, what happend was, as I was trying to enable streaming support through my new settings, I've got errors.
So I started looking at the code and I refactored it in a way that was more readable for me (and perhaps me only 😆).
As I had a few more ideas in mind, when it was working, I didn't thought about reverting it and just carried on creating a new branch for something else.

The "true" fix ended up being the addition of the StreamingResponse when the new flag was set on the OpenAI Speech API compatible webservice :

#

alltalk_tts/tts_server.py

Lines 1150 to 1156 in c83faf9

    
           if current_model_engine.streaming_enabled: 
        
               audio_stream = await generate_audio( 
        
                   cleaned_string, mapped_voice, "auto", current_model_engine.temperature_set, 
        
                   float(str(current_model_engine.repetitionpenalty_set).replace(',', '.')), speed, current_model_engine.pitch_set, 
        
                   output_file=None, streaming=True 
        
               ) 
        
               return StreamingResponse(audio_stream, media_type="audio/wav")

As for the model engine redefinition, what happends was that when I was playing with the GUI, the new saved settings were not used when I tested the API.
So I debugged what was in the "model_engine" variable and it contained the settings of when I launched the app, not the newly saved ones (I don't know if I'm clear..)
Honestly I didn't understood why and didn't looked into it very much. Perhaps when we save the news configuration on the GUI, this variable is not updated correctly ?
The easy solution for me was to re-load it as you saw here but it might not be a "production-worthy" fix I can't deny it !

#

alltalk_tts/tts_server.py

Lines 1129 to 1130 in c83faf9

    
           # Load current model engine configuration 
        
           current_model_engine = tts_class()

Good luck with your work ! I'm now hyped !! 🍿

SilyNoMeta and others added 9 commits January 3, 2025 18:32

Add initial support for pickletensor models to F5-TTS

e72a7bf

* Tested with @RASPAUDIO french model available here : https://huggingface.co/RASPIAUDIO/F5-French-MixedSpeakers-reduced

Add language auto-detection

ecb1520

* adds langdetect as requirement for colab, standalone and textgen * adds "auto" to the language dropdown in the Advanced Engine/Model Settings panel * replace the hardcoded "en" by "auto" when called by the OpenAI compatible Speech API

Merge pull request #1 from SilyNoMeta/feat/f5-pickletensor

8f92536

Add initial support for pickletensor models to F5-TTS

Merge pull request #2 from SilyNoMeta/feat/auto-lang-detection

419f642

Add language auto-detection

Add streaming flag on tts settings

72ae93d

Use streaming flag within OpenAI Speech API

f7c7800

Add streaming status to logs

7b79c30

Fix XTTS streaming mode

0c92519

Remove debug purpose logs

c83faf9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and enable XTTS streaming #478

Fix and enable XTTS streaming #478

SilyNoMeta commented Jan 4, 2025 •

edited

Loading

erew123 commented Jan 7, 2025 •

edited

Loading

SilyNoMeta commented Jan 7, 2025

Fix and enable XTTS streaming #478

Are you sure you want to change the base?

Fix and enable XTTS streaming #478

Conversation

SilyNoMeta commented Jan 4, 2025 • edited Loading

erew123 commented Jan 7, 2025 • edited Loading

SilyNoMeta commented Jan 7, 2025

SilyNoMeta commented Jan 4, 2025 •

edited

Loading

erew123 commented Jan 7, 2025 •

edited

Loading