From 3a2a48783596fccccbb26be35beed87a18f1351b Mon Sep 17 00:00:00 2001 From: michaelfeil Date: Tue, 12 Nov 2024 21:52:18 -0800 Subject: [PATCH] update infinity --- README.md | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index b89281f6..c4bb0309 100644 --- a/README.md +++ b/README.md @@ -55,9 +55,7 @@ Infinity is a high-throughput, low-latency REST API for serving text-embeddings, - [2024/01] TensorRT / ONNX inference - [2023/10] Initial release - ## Getting started - ### Launch the cli via pip install ```bash pip install infinity-emb[all] @@ -71,7 +69,6 @@ Check the `v2 --help` command to get a description for all parameters. ```bash infinity_emb v2 --help ``` - ### Launch the CLI using a pre-built docker container (recommended) Instead of installing the CLI via pip, you may also use docker to run `michaelf34/infinity`. Make sure you mount your accelerator ( i.e. install `nvidia-docker` and activate with `--gpus all`). @@ -202,9 +199,8 @@ The cache path at inside the docker container is set by the environment variable ### Supported Tasks and Models by Infinity -Infinity aims to be the inference server supporting most functionality for embeddings, reranking and related RAG tasks. - -The following tasks and tested example models are supported. Infinity tests 15+ architectures and all of the below cases in the Github CI. +Infinity aims to be the inference server supporting most functionality for embeddings, reranking and related RAG tasks. The following Infinity tests 15+ architectures and all of the below cases in the Github CI. +Click on the sections below to find tasks and **validated example models**.
Text Embeddings @@ -218,12 +214,19 @@ The following tasks and tested example models are supported. Infinity tests 15+ - [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) - [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) - [jinaai/jina-embeddings-v2-base-code](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) + - [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) - - limited support for decoder=based models, e.g. Qwen / Mistral7B. See [Alibaba-NLP/gte-Qwen2-1.5B-instruct manual](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct/discussions/20). Keep in mind that they are ~20-100x larger (&slower) than bert-small models. + - [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) + - [jinaai/jina-embeddings-v3](nomic-ai/nomic-embed-text-v1.5) + - [BAAI/bge-m3, no sparse](https://huggingface.co/BAAI/bge-m3) + - decoder-based models. Keep in mind that they are ~20-100x larger (&slower) than bert-small models: + - [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct/discussions/20) + - [Salesforce/SFR-Embedding-2_R](https://huggingface.co/Salesforce/SFR-Embedding-2_R/discussions/6) + - [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct/discussions/39) Other models: - Most embedding model are likely supported: https://huggingface.co/models?pipeline_tag=feature-extraction&other=text-embeddings-inference&sort=trending - - Check MTEB leaderboard for models https://huggingface.co/spaces/mteb/leaderboard . Note: Most high ranking models are very large models which are expensive to run at scale for marginal accuracy improvements. + - Check MTEB leaderboard for models https://huggingface.co/spaces/mteb/leaderboard.
@@ -234,6 +237,7 @@ The following tasks and tested example models are supported. Infinity tests 15+ Tested reranking models: - [mixedbread-ai/mxbai-rerank-xsmall-v1](https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1) - [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) + - [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) - [jinaai/jina-reranker-v1-turbo-en](https://huggingface.co/jinaai/jina-reranker-v1-turbo-en) - [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) @@ -254,7 +258,8 @@ The following tasks and tested example models are supported. Infinity tests 15+ Tested image<->text models: - [wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M](https://huggingface.co/wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M) - [jinaai/jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1) - - Models of type: ClipModel + - [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) + - Models of type: ClipModel / SiglipModel in `config.json` Tested audio<->text models: - [Clap Models from LAION](https://huggingface.co/collections/laion/clap-contrastive-language-audio-pretraining-65415c0b18373b607262a490) @@ -277,6 +282,8 @@ The following tasks and tested example models are supported. Infinity tests 15+ - [colbert-ir/colbertv2.0](https://huggingface.co/colbert-ir/colbertv2.0) - [jinaai/jina-colbert-v2](https://huggingface.co/jinaai/jina-colbert-v2) - [mixedbread-ai/mxbai-colbert-large-v1](https://huggingface.co/mixedbread-ai/mxbai-colbert-large-v1) + - [answerai-colbert-small-v1 - click link for instructions](https://huggingface.co/answerdotai/answerai-colbert-small-v1/discussions/14) +
@@ -287,7 +294,7 @@ The following tasks and tested example models are supported. Infinity tests 15+ Example notebook: https://colab.research.google.com/drive/14FqLc0N_z92_VgL_zygWV5pJZkaskyk7?usp=sharing Tested ColPali/ColQwen models: - - [michaelfeil/colpali-v1.2-merged](https://huggingface.co/michaelfeil/colpali-v1.2-merged) + - [vidore/colpali-v1.2-merged](https://huggingface.co/michaelfeil/colpali-v1.2-merged) - [michaelfeil/colqwen2-v0.1](https://huggingface.co/michaelfeil/colqwen2-v0.1) - No lora adapters supported, only "merged" models.
@@ -299,7 +306,7 @@ The following tasks and tested example models are supported. Infinity tests 15+ Tested models: - [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert), financial news classification - [SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions), text to emotion categories. - - bert-models with more than 1 label. + - bert-style text-classifcation models with more than >1 label in `config.json` ### Infinity usage via the Python API