From 79a422bc7d8c9eed77a8971ed74abc6edade1bb5 Mon Sep 17 00:00:00 2001
From: Naveen Tatikonda <navtat@amazon.com>
Date: Fri, 26 Jul 2024 14:11:26 -0500
Subject: [PATCH] [Doc] Lucene inbuilt scalar quantization (#7797)

* [Doc] Lucene inbuilt scalar quantization in k-NN

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Doc review

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Clarified M

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* One more change

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update _search-plugins/knn/knn-vector-quantization.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Reword search time sentence

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update _search-plugins/knn/knn-vector-quantization.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
---
 .../styles/Vocab/OpenSearch/Words/accept.txt  |   3 +-
 .../knn/knn-vector-quantization.md            | 118 +++++++++++++++++-
 2 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
index b588586138..9e09f21c3a 100644
--- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
+++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
@@ -93,7 +93,8 @@ pebibyte
 [Pp]reprocess
 [Pp]retrain
 [Pp]seudocode
-[Quantiz](e|ation|ing|er)
+[Qq]uantiles?
+[Qq]uantiz(e|ation|ing|er)
 [Rr]ebalance
 [Rr]ebalancing
 [Rr]edownload
diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md
index fe4833ee47..656ce72fd2 100644
--- a/_search-plugins/knn/knn-vector-quantization.md
+++ b/_search-plugins/knn/knn-vector-quantization.md
@@ -15,7 +15,113 @@ OpenSearch supports many varieties of quantization. In general, the level of qua
 
 ## Lucene byte vector
 
-Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
+Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucene engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
+
+## Lucene scalar quantization
+
+Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors. 
+
+Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors.
+
+### Using Lucene scalar quantization
+
+To use the Lucene scalar quantizer, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index:
+
+```json
+PUT /test-index
+{
+  "settings": {
+    "index": {
+      "knn": true
+    }
+  },
+  "mappings": {
+    "properties": {
+      "my_vector1": {
+        "type": "knn_vector",
+        "dimension": 2,
+        "method": {
+          "name": "hnsw",
+          "engine": "lucene",
+          "space_type": "l2",
+          "parameters": {
+            "encoder": {
+              "name": "sq"
+            },
+            "ef_construction": 256,
+            "m": 8
+          }
+        }
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+### Confidence interval
+
+Optionally, you can specify the `confidence_interval` parameter in the `method.parameters.encoder` object.
+The `confidence_interval` is used to compute the minimum and maximum quantiles in order to quantize the vectors:
+- If you set the `confidence_interval` to a value in the `0.9` to `1.0` range, inclusive, then the quantiles are calculated statically. For example, setting the `confidence_interval` to `0.9` specifies to compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values. 
+- Setting `confidence_interval` to `0` specifies to compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data.
+- When `confidence_interval` is not set, it is computed based on the vector dimension $$d$$ using the formula $$max(0.9, 1 - \frac{1}{1 + d})$$.
+
+Lucene scalar quantization is applied only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), then the request is rejected.
+{: .warning}
+
+The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors when computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default:
+
+```json
+PUT /test-index
+{
+  "settings": {
+    "index": {
+      "knn": true
+    }
+  },
+  "mappings": {
+    "properties": {
+      "my_vector1": {
+        "type": "knn_vector",
+        "dimension": 2,
+        "method": {
+          "name": "hnsw",
+          "engine": "lucene",
+          "space_type": "l2",
+          "parameters": {
+            "encoder": {
+              "name": "sq",
+              "parameters": {
+                "confidence_interval": 1.0
+              }
+            },
+            "ef_construction": 256,
+            "m": 8
+          }
+        }
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+There are no changes to ingestion or query mapping and no range limitations for the input vectors. 
+
+### Memory estimation
+
+In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use only 25% of the memory required by 32-bit vectors.
+
+#### HNSW memory estimation
+
+The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * M)` bytes/vector, where `M` is the maximum number of bidirectional links created for each element during the construction of the graph.
+
+As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:
+
+```r
+1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.4 GB
+```
 
 ## Faiss 16-bit scalar quantization 
  
@@ -148,7 +254,7 @@ The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated
 
 As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:
 
-```bash
+```r
 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB
 ```
 
@@ -158,7 +264,7 @@ The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_ve
 
 As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows:
 
-```bash
+```r
 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256))  ~= 0.525 GB
 ```
 
@@ -191,8 +297,8 @@ The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8
 
 As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows:
 
-```bash
-1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB
+```r
+1.1 * ((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB
 ```
 
 #### IVF memory estimation
@@ -201,6 +307,6 @@ The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8)
 
 For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows:
 
-```bash
+```r
 1.1*((8 / 8 * 64 + 24) * 1000000  + 100 * (2^8 * 4 * 256 + 4 * 512 * 256))  ~= 0.171 GB
 ```