Replies: 10 comments
-
Well there is a 4GB limitation https://en.wikipedia.org/wiki/WAV#Limitations and it is an old standard, going back to 1991. I cant speak as to how things such as ffmpeg would handle such large files. But that's 6.8 hours at 44100Hz, so maybe closer to 10-12 tops at 22050Hz (with headers/padding overheads). An average book, you're looking at generating 70,000-100,000 words, 3000-6000 sentences (or wav files should I say) then trying to combine them into 1x file. Its quite a technical overhead on a system to do that and I'm not sure where the limitations are. It would be a decent amount of testing and coding to make something like that work. Can I ask what your exact use case is? |
Beta Was this translation helpful? Give feedback.
-
Oh that makes sense. it outputs 24k and does 9.5 hours without a problem. So the maximum is somewhere between 10 and 17 hours.
The final combination process was so intensive that my game lagged out and my computer froze for 2minutes.
Audiobook production. That's my use case. If you have spare time and energy to make it output multiple files instead in a queue as a batch as a new feature request. That would help a lot.
Like the feature that handbrake has. batch processing.
Please inform me of interesting jobs and opportunities
…________________________________
From: erew123 ***@***.***>
Sent: Thursday, December 21, 2023 7:37:25 AM
To: erew123/alltalk_tts ***@***.***>
Cc: unifirer ***@***.***>; Author ***@***.***>
Subject: Re: [erew123/alltalk_tts] batch que please? (Issue #8)
Well there is a 4GB limitation https://en.wikipedia.org/wiki/WAV#Limitations and it is an old standard, going back to 1991. I cant speak as to how things such as ffmpeg would handle such large files. But that's 6.8 hours at 44100Hz, so maybe closer to 10-12 tops at 22050Hz (with headers/padding overheads).
An average book, you're looking at generating 70,000-100,000 words, 3000-6000 sentences (or wav files should I say) then trying to combine them into 1x file. Its quite a technical overhead on a system to do that and I'm not sure where the limitations are. It would be a decent amount of testing and coding to make something like that work.
Can I ask what your exact use case is?
—
Reply to this email directly, view it on GitHub<#8 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEMJT2OP6C452TZ25MOX5WDYKMV6LAVCNFSM6AAAAABA4YYKH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRUHE2TSMBWG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Perhaps instead of queing multiple videos. que multiple text files
…________________________________
From: erew123 ***@***.***>
Sent: Thursday, December 21, 2023 7:37:25 AM
To: erew123/alltalk_tts ***@***.***>
Cc: unifirer ***@***.***>; Author ***@***.***>
Subject: Re: [erew123/alltalk_tts] batch que please? (Issue #8)
Well there is a 4GB limitation https://en.wikipedia.org/wiki/WAV#Limitations and it is an old standard, going back to 1991. I cant speak as to how things such as ffmpeg would handle such large files. But that's 6.8 hours at 44100Hz, so maybe closer to 10-12 tops at 22050Hz (with headers/padding overheads).
An average book, you're looking at generating 70,000-100,000 words, 3000-6000 sentences (or wav files should I say) then trying to combine them into 1x file. Its quite a technical overhead on a system to do that and I'm not sure where the limitations are. It would be a decent amount of testing and coding to make something like that work.
Can I ask what your exact use case is?
—
Reply to this email directly, view it on GitHub<#8 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEMJT2OP6C452TZ25MOX5WDYKMV6LAVCNFSM6AAAAABA4YYKH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRUHE2TSMBWG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
With text files we can split them into chapters too!
…________________________________
From: erew123 ***@***.***>
Sent: Thursday, December 21, 2023 7:37:25 AM
To: erew123/alltalk_tts ***@***.***>
Cc: unifirer ***@***.***>; Author ***@***.***>
Subject: Re: [erew123/alltalk_tts] batch que please? (Issue #8)
Well there is a 4GB limitation https://en.wikipedia.org/wiki/WAV#Limitations and it is an old standard, going back to 1991. I cant speak as to how things such as ffmpeg would handle such large files. But that's 6.8 hours at 44100Hz, so maybe closer to 10-12 tops at 22050Hz (with headers/padding overheads).
An average book, you're looking at generating 70,000-100,000 words, 3000-6000 sentences (or wav files should I say) then trying to combine them into 1x file. Its quite a technical overhead on a system to do that and I'm not sure where the limitations are. It would be a decent amount of testing and coding to make something like that work.
Can I ask what your exact use case is?
—
Reply to this email directly, view it on GitHub<#8 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEMJT2OP6C452TZ25MOX5WDYKMV6LAVCNFSM6AAAAABA4YYKH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRUHE2TSMBWG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Ok. Well its not a core feature that has been in my roadmap and it would take some thinking around how best to split out that much text, send bits off for generation, potentially combine every X amount of generated wavs, then keep going and do a smaller combine at the end. Compiling that much audio into one and how to handle that process is a potentially complicated task. And obviously, the only way I could test the robustness of such a thing would be to write code then set a machine off and running for X hours and if it crashes/errors, change the code and then send it off again etc. I also don't want to get into feature creep while working on what is a new project, as I do have certain goals.. Ultimately, it would probably be best handled by a separate script that calls on the AllTalk API. It can deal with any filtering/cleaning, breaking down of text ahead of time and you could probably feed in a text file or something. So, Im not saying no, but on the same note, I've got quite a bit going on currently to jump off to batch queue text generation and I need to get those core things done initially. Ill move this over from issues to discussions so I've got a reference there. Ill mull the problem over in my head and see where I get to with my other bits of the roadmap. Also that means if others discover this conversation, they are welcome to chip in, if its an idea they are interested in. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your hard work |
Beta Was this translation helpful? Give feedback.
-
Almost done.... Not available yet! If you're going to be making money from this... I hope you'll make me a donation at some point ;) |
Beta Was this translation helpful? Give feedback.
-
Thank you very much. I haven't made a cent yet because in every hour of speech generated. I had around 5 seconds of continuous screech or rumbles.
I've managed to get it down to 1 second after decreasing the parameters to the lower values. Thank you for that function too
Once I start making a profit you'll get a percentage share for sure. I'm still not sure about the demand yet. But the market seems to be way overpriced right now.
On 7/01/2024 at 5:58 am, erew123 ***@***.***> wrote:
image.png (view on web)
Almost done.... Not available yet!
If you're going to be making money from this... I hope you'll make me a donation at some point ;)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
i tried to make a donation when i first started using your app and tried again now, cant see a link, do u have it somewhere else? its not on ur profile |
Beta Was this translation helpful? Give feedback.
-
Thanks! Though, you're right, I've got no links at the moment for anything like that... I've yet to figure it out! Guess Ive been too busy coding! Here you go though! https://github.com/erew123/alltalk_tts#-alltalk-tts-generator I'm going to mark this closed now hah! |
Beta Was this translation helpful? Give feedback.
-
with the deepspeed optimisation, we can actually do multiple books in 1 night!
if u have some spare time, a batch que would allow us to do so!
another fantastic reason to do so, somewhere in the process, the max length of wav is <17h. after that the output shows 17 but is 4.5hrs max, maybe its a limit of ffmpeg or something else like a limit of the wav file or the header or something, i couldnt find it tho
thank you very much
Beta Was this translation helpful? Give feedback.
All reactions