Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How would you use a different TTS program for this? #62

Open
Mike-MW opened this issue Dec 6, 2024 · 4 comments
Open

How would you use a different TTS program for this? #62

Mike-MW opened this issue Dec 6, 2024 · 4 comments

Comments

@Mike-MW
Copy link

Mike-MW commented Dec 6, 2024

Elevenlabs is a bit pricey. I'd prefer to use something like amazon polly, yeah it's lower quality but you get more speech for the price and I don't exactly have a lot of excess cash to throw around.

@tizu69
Copy link

tizu69 commented Dec 14, 2024

have a look at https://github.com/DougDougGithub/Babagaboosh/blob/main/eleven_labs.py - ElevenLabsManager is quite modular, and as long as polly has an api you should be easily able to integrate it. sadly I currently do not have the resources to get polly working on my machine so I can't provide any code snippets, https://docs.aws.amazon.com/polly/latest/dg/SynthesizeSpeechSamplePython.html + a library that plays the audio seems like it could help.

@Mike-MW
Copy link
Author

Mike-MW commented Dec 14, 2024

Thanks for the reply but I ended up rebuilding the whole thing in Rust in the end, Python is just such a painful language to deal with. I have a version where the VTT and TTS are both handled by the open AI api so I only need one API key.

@tizu69
Copy link

tizu69 commented Dec 14, 2024

is it open source? I think that'd be interesting to check out :)

@Mike-MW
Copy link
Author

Mike-MW commented Dec 14, 2024

https://github.com/slbsh/chatgpt_slop this is the link

The instructions are as follows:
Edit the config to include the following:
your openai api key
global listen is whether or not you want to be able to activate it from another window or not
device is the system name of your microphone
backend is set to linux by default but switch it to dshow if you use windows
keycode is the code of the key you want to activate the recording, if you want a keycode that's not part of the standard keyboard and you have a keyboard/mouse with macro keys you could use 124 - 135 for F13-F24
The prompt can be multi-lined but only if you have """ before and """ after the prompt
you can increase or decrease the message limit to determine how much info chat gpt stores in memory during your conversation.

Once you have config sorted save and close it, then open a terminal window by right clicking in the folder, click terminal, then type in "cargo run" and hit enter, it will compile the files, you can then test it to make sure it's working

When finished you can CTRL+C to exit, it will show an error when yo do so but don't worry about it.

You can then run "cargo build --release" in the terminal to build a release version inside the "target" folder with an EXE that takes you straight to the listening part, but before you use it you must copy the config toml file into the release folder

not the most user friendly thing in the world but, it's a start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants