From 223a236ddc193c4ec8b64291b719fdfe0c2777ee Mon Sep 17 00:00:00 2001 From: Naveen Tatikonda Date: Mon, 22 Jul 2024 19:51:06 -0500 Subject: [PATCH 01/10] [Doc] Lucene inbuilt scalar quantization in k-NN Signed-off-by: Naveen Tatikonda --- .../knn/knn-vector-quantization.md | 105 ++++++++++++++++++ 1 file changed, 105 insertions(+) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index fe4833ee47..8cc48f58d1 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -17,6 +17,111 @@ OpenSearch supports many varieties of quantization. In general, the level of qua Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +## Lucene scalar quantization + +Starting with version 2.16, the k-NN plugin supports inbuilt scalar quantization for Lucene engine within OpenSearch. Unlike [lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), the scalar quantizer in lucene engine quantizes the input vectors from 32-bit float to 7-bit int vectors without any need for quantizing the vectors outside of OpenSearch. +The scalar quantizer in lucene dynamically quantizes the input 32-bit floating-point vectors into 7-bit integer vectors in each segment using the minQuantile and maxQuantile computed using the `confidence_interval` parameter. + +During search, the query vector is also quantized in each segment using the minQuantile and maxQuantile of that segment to compute the distance against quantized input vectors of that segment. The quantization can decrease the memory footprint by a factor of 4 at the cost of some recall. But, there is a slight increase +in the disk usage due to the overhead of storing the input raw vectors and quantized vectors. + +### Using Lucene scalar quantization + +To use the scalar quantizer, set the k-NN vector field’s `method.parameters.encoder.name` to `sq` when creating a k-NN index: +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2, + "method": { + "name": "hnsw", + "engine": "lucene", + "space_type": "l2", + "parameters": { + "encoder": { + "name": "sq" + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +Optionally, you can specify the parameters in `method.parameters.encoder` shown below: +* `bits` - Determines the size of the quantized vector after quantizing the input float vectors. For instance, 7 bits will quantize input float vectors into 7-bit integer vectors. As of OpenSearch 2.16, only `7` bits are supported. Default value is `7`. +* `confidence_interval` - used to compute the `minQuantile` and `maxQuantile` parameters which are used to quantize the vectors. The accepted values are: + - It can be any value between and including `0.9` to `1.0`. For example, if we set it to `0.9` then it will consider the middle 90% of the vector values for computing the min and max Quantiles excluding the minimum and maximum 5% of the values. + - It can be also set to `0`. It is the dynamic confidence interval which will dynamically compute the min and max quantiles with some oversampling and additional computation of input data (unlike the above one which will statically compute using the provided confidence interval). + - By default, when this parameter is not set it will be computed from the dimension of the vector as `Max(0.9, 1 - (1 / (1 + d)))`. + +The lucene scalar quantization doesn't work when `data_type` is set to `byte` or any other data type except `float` (the default type). +{: .warning} + +The following example method definition specifies the Lucene sq encoder with `confidence_interval` as `1.0` which will consider all the input vectors for computing the min and max Quantiles and quantizes them (to `7` bits by default). +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2, + "method": { + "name": "hnsw", + "engine": "lucene", + "space_type": "l2", + "parameters": { + "encoder": { + "name": "sq", + "parameters": { + "confidence_interval": 1.0 + } + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +There are no changes to ingestion or query mapping and no range limitations for the input vectors. + +### Memory estimation + +In the best-case scenario, 7-bit vectors produced by the Lucene scalar quantizer require 25% of the memory that 32-bit vectors require. + +#### HNSW memory estimation + +The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (dimension + 8 * M)` bytes/vector. + +As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: + +```bash +1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.4 GB +``` + ## Faiss 16-bit scalar quantization Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. From 709428ab4234cc9f3897a6ec5caad0d7e066db98 Mon Sep 17 00:00:00 2001 From: Naveen Tatikonda Date: Wed, 24 Jul 2024 15:11:24 -0500 Subject: [PATCH 02/10] Address Review Comments Signed-off-by: Naveen Tatikonda --- _search-plugins/knn/knn-vector-quantization.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 8cc48f58d1..51559cc7cf 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -61,7 +61,6 @@ PUT /test-index {% include copy-curl.html %} Optionally, you can specify the parameters in `method.parameters.encoder` shown below: -* `bits` - Determines the size of the quantized vector after quantizing the input float vectors. For instance, 7 bits will quantize input float vectors into 7-bit integer vectors. As of OpenSearch 2.16, only `7` bits are supported. Default value is `7`. * `confidence_interval` - used to compute the `minQuantile` and `maxQuantile` parameters which are used to quantize the vectors. The accepted values are: - It can be any value between and including `0.9` to `1.0`. For example, if we set it to `0.9` then it will consider the middle 90% of the vector values for computing the min and max Quantiles excluding the minimum and maximum 5% of the values. - It can be also set to `0`. It is the dynamic confidence interval which will dynamically compute the min and max quantiles with some oversampling and additional computation of input data (unlike the above one which will statically compute using the provided confidence interval). From 138ea179624141fb5d8e831fcc2fe4270a4bbe59 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 25 Jul 2024 15:03:26 -0400 Subject: [PATCH 03/10] Doc review Signed-off-by: Fanit Kolchina --- .../styles/Vocab/OpenSearch/Words/accept.txt | 3 +- .../knn/knn-vector-quantization.md | 44 ++++++++++--------- 2 files changed, 25 insertions(+), 22 deletions(-) diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index b588586138..9e09f21c3a 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -93,7 +93,8 @@ pebibyte [Pp]reprocess [Pp]retrain [Pp]seudocode -[Quantiz](e|ation|ing|er) +[Qq]uantiles? +[Qq]uantiz(e|ation|ing|er) [Rr]ebalance [Rr]ebalancing [Rr]edownload diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 51559cc7cf..d1876fd2e4 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -15,19 +15,18 @@ OpenSearch supports many varieties of quantization. In general, the level of qua ## Lucene byte vector -Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucene engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). ## Lucene scalar quantization -Starting with version 2.16, the k-NN plugin supports inbuilt scalar quantization for Lucene engine within OpenSearch. Unlike [lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), the scalar quantizer in lucene engine quantizes the input vectors from 32-bit float to 7-bit int vectors without any need for quantizing the vectors outside of OpenSearch. -The scalar quantizer in lucene dynamically quantizes the input 32-bit floating-point vectors into 7-bit integer vectors in each segment using the minQuantile and maxQuantile computed using the `confidence_interval` parameter. +Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to compress vectors before ingesting the documents, the Lucene scalar quantizer compresses input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the `confidence_interval` parameter. -During search, the query vector is also quantized in each segment using the minQuantile and maxQuantile of that segment to compute the distance against quantized input vectors of that segment. The quantization can decrease the memory footprint by a factor of 4 at the cost of some recall. But, there is a slight increase -in the disk usage due to the overhead of storing the input raw vectors and quantized vectors. +During search, the query vector is quantized in each segment using the minimum and maximum quantiles of that segment. Then, OpenSearch computes distances from the quantized query vector to the quantized input vectors of that segment. Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. ### Using Lucene scalar quantization -To use the scalar quantizer, set the k-NN vector field’s `method.parameters.encoder.name` to `sq` when creating a k-NN index: +To use the Lucene scalar quantizer, set the k-NN vector field’s `method.parameters.encoder.name` to `sq` when creating a k-NN index: + ```json PUT /test-index { @@ -60,16 +59,19 @@ PUT /test-index ``` {% include copy-curl.html %} -Optionally, you can specify the parameters in `method.parameters.encoder` shown below: -* `confidence_interval` - used to compute the `minQuantile` and `maxQuantile` parameters which are used to quantize the vectors. The accepted values are: - - It can be any value between and including `0.9` to `1.0`. For example, if we set it to `0.9` then it will consider the middle 90% of the vector values for computing the min and max Quantiles excluding the minimum and maximum 5% of the values. - - It can be also set to `0`. It is the dynamic confidence interval which will dynamically compute the min and max quantiles with some oversampling and additional computation of input data (unlike the above one which will statically compute using the provided confidence interval). - - By default, when this parameter is not set it will be computed from the dimension of the vector as `Max(0.9, 1 - (1 / (1 + d)))`. +### Confidence interval + +Optionally, you can specify the `confidence_interval` parameter in the `method.parameters.encoder` object. +The `confidence_interval` is used to compute the minimum and maximum quantiles in order to quantize the vectors: +- If you set the `confidence_interval` to a value in `0.9` to `1.0` range, inclusive, then the quantiles are calculated statically. For example, setting the `confidence_interval` to `0.9` specifies to compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values. +- Setting `confidence_interval` to `0` specifies to compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data. +- When `confidence_interval` is not set, it is computed based on the vector dimension $$d$$ using the formula $$max(0.9, 1 - \frac{1}{1 + d})$$. -The lucene scalar quantization doesn't work when `data_type` is set to `byte` or any other data type except `float` (the default type). +Lucene scalar quantization applies only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), Lucene scalar quantization is not applied. {: .warning} -The following example method definition specifies the Lucene sq encoder with `confidence_interval` as `1.0` which will consider all the input vectors for computing the min and max Quantiles and quantizes them (to `7` bits by default). +The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors for computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default: + ```json PUT /test-index { @@ -109,15 +111,15 @@ There are no changes to ingestion or query mapping and no range limitations for ### Memory estimation -In the best-case scenario, 7-bit vectors produced by the Lucene scalar quantizer require 25% of the memory that 32-bit vectors require. +In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use only 25% of the memory required by 32-bit vectors. #### HNSW memory estimation -The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (dimension + 8 * M)` bytes/vector. +The memory required for the Hierarchical Navigable Small World (HNSW) graph is approximately `1.1 * (dimension + 8 * M)` bytes/vector. As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: -```bash +```r 1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.4 GB ``` @@ -252,7 +254,7 @@ The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: -```bash +```r 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB ``` @@ -262,7 +264,7 @@ The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_ve As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: -```bash +```r 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB ``` @@ -295,8 +297,8 @@ The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8 As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: -```bash -1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB +```r +1.1 * ((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB ``` #### IVF memory estimation @@ -305,6 +307,6 @@ The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: -```bash +```r 1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB ``` From 27634b5bb789b7e3da280bc9e49694200286b13a Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 25 Jul 2024 16:25:50 -0400 Subject: [PATCH 04/10] Clarified M Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index d1876fd2e4..8abc95dfff 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -115,7 +115,7 @@ In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use #### HNSW memory estimation -The memory required for the Hierarchical Navigable Small World (HNSW) graph is approximately `1.1 * (dimension + 8 * M)` bytes/vector. +The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * M)` bytes/vector, where `M` is the maximum number of bi-directional links created for each element during the construction of the graph. As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: From 278ea23f5bb9a8736e8b699fefb2a9a5ff463555 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 25 Jul 2024 16:53:12 -0400 Subject: [PATCH 05/10] Tech review comments Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 8abc95dfff..db0247e28f 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -19,9 +19,9 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucen ## Lucene scalar quantization -Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to compress vectors before ingesting the documents, the Lucene scalar quantizer compresses input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the `confidence_interval` parameter. +Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer compresses input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the `confidence_interval` parameter. -During search, the query vector is quantized in each segment using the minimum and maximum quantiles of that segment. Then, OpenSearch computes distances from the quantized query vector to the quantized input vectors of that segment. Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. +During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance of the query vector to the segment's quantized input vectors. Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. ### Using Lucene scalar quantization @@ -67,7 +67,7 @@ The `confidence_interval` is used to compute the minimum and maximum quantiles i - Setting `confidence_interval` to `0` specifies to compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data. - When `confidence_interval` is not set, it is computed based on the vector dimension $$d$$ using the formula $$max(0.9, 1 - \frac{1}{1 + d})$$. -Lucene scalar quantization applies only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), Lucene scalar quantization is not applied. +Lucene scalar quantization is applied only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), the request is rejected. {: .warning} The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors for computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default: From 436fb1c39216f75125aabe5e5725292eba317c91 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 25 Jul 2024 16:55:57 -0400 Subject: [PATCH 06/10] One more change Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index db0247e28f..8c5850e0aa 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -19,7 +19,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucen ## Lucene scalar quantization -Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer compresses input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the `confidence_interval` parameter. +Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the `confidence_interval` parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance of the query vector to the segment's quantized input vectors. Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. From 4d6522049fe6eafd55d667a6f652923227c9d952 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Thu, 25 Jul 2024 17:01:15 -0400 Subject: [PATCH 07/10] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 8c5850e0aa..2820537e39 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -19,7 +19,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucen ## Lucene scalar quantization -Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the `confidence_interval` parameter. +Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance of the query vector to the segment's quantized input vectors. Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. From 6b10053a39c0ce42e46e87b6365f957f39b58c73 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 25 Jul 2024 17:05:30 -0400 Subject: [PATCH 08/10] Reword search time sentence Signed-off-by: Fanit Kolchina --- _search-plugins/knn/knn-vector-quantization.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 2820537e39..aa8e4a6892 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -19,9 +19,9 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucen ## Lucene scalar quantization -Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. +Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment’s minimum and maximum quantiles in order to compute the distance between the query vector and the segment’s quantized input vectors. -During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance of the query vector to the segment's quantized input vectors. Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. +Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. ### Using Lucene scalar quantization From 64832e823a4e8068fdab560f7872258b754f3ba5 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Thu, 25 Jul 2024 18:19:17 -0400 Subject: [PATCH 09/10] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index aa8e4a6892..958092f85b 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -115,7 +115,7 @@ In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use #### HNSW memory estimation -The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * M)` bytes/vector, where `M` is the maximum number of bi-directional links created for each element during the construction of the graph. +The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * M)` bytes/vector, where `M` is the maximum number of bidirectional links created for each element during the construction of the graph. As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: From 5fc781c134eaa9ac429afbce1a28c2ab2c180b04 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 26 Jul 2024 10:35:52 -0400 Subject: [PATCH 10/10] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/knn/knn-vector-quantization.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 958092f85b..656ce72fd2 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -19,13 +19,13 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucen ## Lucene scalar quantization -Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors within OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment’s minimum and maximum quantiles in order to compute the distance between the query vector and the segment’s quantized input vectors. +Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors. Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. ### Using Lucene scalar quantization -To use the Lucene scalar quantizer, set the k-NN vector field’s `method.parameters.encoder.name` to `sq` when creating a k-NN index: +To use the Lucene scalar quantizer, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index: ```json PUT /test-index @@ -63,14 +63,14 @@ PUT /test-index Optionally, you can specify the `confidence_interval` parameter in the `method.parameters.encoder` object. The `confidence_interval` is used to compute the minimum and maximum quantiles in order to quantize the vectors: -- If you set the `confidence_interval` to a value in `0.9` to `1.0` range, inclusive, then the quantiles are calculated statically. For example, setting the `confidence_interval` to `0.9` specifies to compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values. +- If you set the `confidence_interval` to a value in the `0.9` to `1.0` range, inclusive, then the quantiles are calculated statically. For example, setting the `confidence_interval` to `0.9` specifies to compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values. - Setting `confidence_interval` to `0` specifies to compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data. - When `confidence_interval` is not set, it is computed based on the vector dimension $$d$$ using the formula $$max(0.9, 1 - \frac{1}{1 + d})$$. -Lucene scalar quantization is applied only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), the request is rejected. +Lucene scalar quantization is applied only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), then the request is rejected. {: .warning} -The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors for computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default: +The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors when computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default: ```json PUT /test-index