From 3acacb15c7982873c9a73ee0cf255b5c6d1d9683 Mon Sep 17 00:00:00 2001 From: Ce Gao Date: Thu, 11 Apr 2024 14:38:37 +0800 Subject: [PATCH] chore: Refine readme Signed-off-by: Ce Gao --- README.md | 34 +++++++++++++++++----------------- docs/images/arch.svg | 3 +++ 2 files changed, 20 insertions(+), 17 deletions(-) create mode 100644 docs/images/arch.svg diff --git a/README.md b/README.md index cb3f7ec..fa1b4b5 100644 --- a/README.md +++ b/README.md @@ -4,28 +4,24 @@ discord invitation link trackgit-views -End-to-end service to query the text with hybrid search and rerank. +QText is a microservices framework for building the RAG pipeline, or semantic search engine on top of Postgres. It provides a simple API to add, query, and highlight the text in your existing database. -Application scenarios: -- Personal knowledge database + search engine -- Rerank experiment and visualization -- RAG pipeline +The main features include: + +- Full-text search with Postgres GIN index. +- Vector and sparse search with [pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) +- Reranking with cross-encoder model, cohere reranking API, or other methods. +- Semantic highlight + +Besides this, qtext also provides a dashboard to visualize the vector search, sparse vector search, full text search, and reranking results. [![asciicast](https://asciinema.org/a/653540.svg)](https://asciinema.org/a/653540) -## Features +## Design goals -- [x] full text search (Postgres GIN + text search) -- [x] vector similarity search ([pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) HNSW) -- [x] sparse search ([pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) HNSW) -- [x] generate vector and sparse vector if not provided -- [x] reranking -- [x] semantic highlight -- [x] hybrid search explanation -- [x] TUI -- [x] OpenAPI -- [x] OpenMetrics -- [ ] filtering +- **Simple**: easy to deploy and use. +- **Customizable**: can be integrated into your existing databases. +- **Extensible**: can be extended with new features. ## How to use @@ -41,6 +37,10 @@ Some of the dependent services can be opt-out: - `highlight`: used to provide the semantic highlight feature - `encoder`: rerank with cross-encoder model, you can choose other methods or other online services +
+arch +
+ For the client example, check: - [test.py](./test.py): simple demo. - [test_cohere_wiki.py](./test_cohere_wiki.py): a Wikipedia dataset with Cohere embedding. diff --git a/docs/images/arch.svg b/docs/images/arch.svg new file mode 100644 index 0000000..bda8f11 --- /dev/null +++ b/docs/images/arch.svg @@ -0,0 +1,3 @@ + + +
Reranker
Postgres (pgvecto.rs, GIN)
This is a query
QText
Embedding Models (Sparse, Dense)
Results
\ No newline at end of file