English | 简体中文
You can useminiconda for a lightweight installation of the python conda environment.
# Install the virtual environment
conda create -n qllm python==3.11
source activate qllm
cd mm_server
# Install python dependencies
pip install -r requirements.txt
Edit the model configuration items:
cp config.py.local config.py
Modify the config.py
configuration file according to your usage:
clip_model_name: Visual embedding model, text embedding model ocr_model_path: OCR text recognition model hf_embedding_model_name: Text embedding model audio_model_name: Video to text model
@dataclass
class Configer:
# mm server host
server_host = "localhost"
server_port = 50110
# image encoder model name
clip_model_name = "ViT-B/32"
# model keep alive time (s)
clip_keep_alive = 60
# ocr model path
ocr_model_path = str(TEMPLATE_PATH / "services/ocr_server/ocr_models")
# model keep alive time (s)
ocr_keep_alive = 60
# text embedding model name
hf_embedding_model_name = "BAAI/bge-small-en-v1.5"
# model keep alive time (s)
hf_keep_alive = 60
# video transcription model name
audio_model_name = "small"
# model keep alive time (s)
video_keep_alive = 1
source activate qllm
python main.py
# reload mode, hot update after modifying the code
# uvicorn main:app --reload --host localhost --port 50110
Refer to the instructions for installing Ollama:
Starting the Local Model Choose the 8b local model, though there will be a loss in performance.
# Local version:
ollama run llama3:8b-instruct-q4_0
(Optional) Local 70B Large Model
For servers, you can choose the 70b large model version. You also need to modify model_name
in mmrag_server/config.py
.
# Remote server version:
ollama run llama3:70b-instruct
(Optional) Model Lifecycle Management Ollama automatically releases the model if it's not used for 5 minutes, and restarts it upon the next access. If you need to keep it loaded for a longer period, you can configure it as follows:
- Duration string (e.g., "10m" or "24h")
- Number in seconds (e.g., 3600)
- Permanently keep the model loaded: any negative number (e.g., -1 or "-1m")
- Unload the model: "0" will unload the model immediately after generating a response
# Permanently keep the model loaded
curl http://localhost:11434/api/generate -d '{"model": "MODEL_NAME", "keep_alive": -1}'
# Unload the model
curl http://localhost:11434/api/generate -d '{"model": "MODEL_NAME", "keep_alive": 0}'
# Set auto-release time
curl http://localhost:11434/api/generate -d '{"model": "MODEL_NAME", "keep_alive": 3600}'
-
long Text Model: 524k llama3 Version
-
High-Quality Open-Source LLM: : Llama-3-70b-Instruct
-
High-Quality Small Mode: llama3:8b-instruct
-
Text Embedding Model: bge-small
-
Image embedding Model: CLIP
-
llm model selection: llm Leaderboard