Skip to content

Commit

Permalink
Update Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
kimjammer committed May 23, 2024
1 parent 2a39603 commit 386a9e4
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 10 deletions.
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The original version was also created in only 7 days, so it is not exactly very
- Audio File playback (for pre-generated songs/covers created with something like [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
- Vtube Studio Plugin & Model/Prop control
- Flexible LLM - Load any model into text-generation-webui (tested) or use any openai-compatible endpoint (not tested).
- Memory - Long-term (persists across restarts) memories can be manually added, but they will also be
- Memory/RAG - Long-term (persists across restarts) memories can be manually added, but they will also be
automatically generated as the AI talks. (See memories/readme.md for details)

## Architecture
Expand Down Expand Up @@ -112,20 +112,23 @@ documentation [here](https://pytwitchapi.dev/en/stable/index.html#user-authentic

### This Project

A virtual environment of some sort is recommended (Python 3.11); this project was developed with venv.
A virtual environment of some sort is recommended (Python 3.11 required); this project was developed with venv.

Install requirements.txt (This is just a pip freeze, so if you're not on windows watch out)
Install the CUDA 11.8 version of pytorch 2.2.2 first.

DeepSpeed (For TTS) will probably need to be installed separately, I was using instructions
from [AllTalkTTS](https://github.com/erew123/alltalk_tts?#-deepspeed-installation-options) , and using their
Then install requirements.txt (This is just a pip freeze, so if you're not on Windows watch out)

Finally, DeepSpeed (For TTS) will need to be installed separately. I was using instructions
from [AllTalkTTS](https://github.com/erew123/alltalk_tts?#-deepspeed-installation-options), and using their
[provided wheels](https://github.com/erew123/alltalk_tts/releases/tag/DeepSpeed-14.0).

Create an .env file using .env.example as reference. You need your Twitch app id and secret.
Create an .env file using .env.example as reference. You need your Twitch app id and secret, along with your
Hugginface token if you use a gated model (like Llama 3).

Place a voice reference wav file in the voices directory. It should be 5~30 seconds long. For details see the RealtimeTTS
repository.

Find your desired microphone and speaker device numbers by running utils/listAudioDevices.py and note its number.
Find your desired microphone and speaker device numbers by running utils/listAudioDevices.py and note its numbers.

Configure constants.py.

Expand Down
9 changes: 8 additions & 1 deletion constants.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# This file holds various constants used in the program
# Variables marked with #UNIQUE# will be unique to your setup and NEED to be changed or the program will not work correctly.

# CORE SECTION: All constants in this section are necessary

# Microphone/Speaker device indices
# Use utils/listAudioDevices.py to find the correct device ID
#UNIQUE#
INPUT_DEVICE_INDEX = 1
OUTPUT_DEVICE_INDEX = 12
OUTPUT_DEVICE_INDEX = 7

# How many seconds to wait before prompting AI
PATIENCE = 60
Expand All @@ -17,9 +20,11 @@
TWITCH_MAX_MESSAGE_LENGTH = 300

# Twitch channel for bot to join
#UNIQUE#
TWITCH_CHANNEL = "lunasparkai"

# Voice reference file for TTS
#UNIQUE#
VOICE_REFERENCE = "neuro.wav"

# MULTIMODAL SPECIFIC SECTION: Not needed when not using multimodal capabilities
Expand All @@ -34,12 +39,14 @@
# LLM SPECIFIC SECTION: Below are constants that are specific to the LLM you are using

# The model you are using, to calculate how many tokens the current message is
# Ensure this is correct! Used for token count estimation
MODEL = "meta-llama/Meta-Llama-3-8B"

# Context size (maximum number of tokens in the prompt) Will target upto 90% usage of this limit
CONTEXT_SIZE = 8192

# This is your name
#UNIQUE#
HOST_NAME = "John"

# This is the AI's name
Expand Down
2 changes: 1 addition & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def signal_handler(sig, frame):
stt = STT(signals)
# Create TTS
tts = TTS(signals)
# Create LLMController
# Create LLMWrapper
llm_wrapper = LLMWrapper(signals, tts, modules)
# Create Prompter
prompter = Prompter(signals, llm_wrapper)
Expand Down
2 changes: 1 addition & 1 deletion memories/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ will be automatically injected into the prompt. Memories will also persist acros
the frontend or the database is deleted.

The automatically generated memories are based off of
("Generative Agents: Interactive Simulacra of Human Behavior")[https://arxiv.org/abs/2304.03442]. Essentially,
[Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/abs/2304.03442). Essentially,
every handful of messages, the LLM will be prompted to review the recent messages and come up with the 3 most high level
questions that encapsulate the conversation and also provide the answer. These question/answer pairs are then each
stored as a (short-term) memory. These short-term memories will persists across restarts unless deleted.
Expand Down
Binary file modified requirements.txt
Binary file not shown.

0 comments on commit 386a9e4

Please sign in to comment.