Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Lucene inbuilt scalar quantization #7797

Merged
merged 10 commits into from
Jul 26, 2024
104 changes: 104 additions & 0 deletions _search-plugins/knn/knn-vector-quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,110 @@

Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).

## Lucene scalar quantization

Starting with version 2.16, the k-NN plugin supports inbuilt scalar quantization for Lucene engine within OpenSearch. Unlike [lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), the scalar quantizer in lucene engine quantizes the input vectors from 32-bit float to 7-bit int vectors without any need for quantizing the vectors outside of OpenSearch.

Check failure on line 22 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Lucene' instead of 'lucene'. Raw Output: {"message": "[Vale.Terms] Use 'Lucene' instead of 'lucene'.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 22, "column": 127}}}, "severity": "ERROR"}

Check failure on line 22 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Lucene' instead of 'lucene'. Raw Output: {"message": "[Vale.Terms] Use 'Lucene' instead of 'lucene'.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 22, "column": 221}}}, "severity": "ERROR"}
The scalar quantizer in lucene dynamically quantizes the input 32-bit floating-point vectors into 7-bit integer vectors in each segment using the minQuantile and maxQuantile computed using the `confidence_interval` parameter.

Check failure on line 23 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: minQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: minQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 23, "column": 147}}}, "severity": "ERROR"}

Check failure on line 23 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: maxQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: maxQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 23, "column": 163}}}, "severity": "ERROR"}

During search, the query vector is also quantized in each segment using the minQuantile and maxQuantile of that segment to compute the distance against quantized input vectors of that segment. The quantization can decrease the memory footprint by a factor of 4 at the cost of some recall. But, there is a slight increase

Check failure on line 25 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: minQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: minQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 25, "column": 77}}}, "severity": "ERROR"}

Check failure on line 25 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: maxQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: maxQuantile. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 25, "column": 93}}}, "severity": "ERROR"}
in the disk usage due to the overhead of storing the input raw vectors and quantized vectors.

### Using Lucene scalar quantization

To use the scalar quantizer, set the k-NN vector field’s `method.parameters.encoder.name` to `sq` when creating a k-NN index:
```json
PUT /test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2,
"method": {
"name": "hnsw",
"engine": "lucene",
"space_type": "l2",
"parameters": {
"encoder": {
"name": "sq"
},
"ef_construction": 256,
"m": 8
}
}
}
}
}
}
```
{% include copy-curl.html %}

Optionally, you can specify the parameters in `method.parameters.encoder` shown below:

Check warning on line 63 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 63, "column": 81}}}, "severity": "WARNING"}
* `confidence_interval` - used to compute the `minQuantile` and `maxQuantile` parameters which are used to quantize the vectors. The accepted values are:
- It can be any value between and including `0.9` to `1.0`. For example, if we set it to `0.9` then it will consider the middle 90% of the vector values for computing the min and max Quantiles excluding the minimum and maximum 5% of the values.

Check failure on line 65 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Quantiles. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Quantiles. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 65, "column": 186}}}, "severity": "ERROR"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to make sure, valid values are 0.9--1.0, inclusive (for static computation), or 0 (for dynamic computation)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's correct. Also, users can skip this parameter which will be computed as shown under the default case

- It can be also set to `0`. It is the dynamic confidence interval which will dynamically compute the min and max quantiles with some oversampling and additional computation of input data (unlike the above one which will statically compute using the provided confidence interval).

Check failure on line 66 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: quantiles. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: quantiles. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 66, "column": 117}}}, "severity": "ERROR"}

Check warning on line 66 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 66, "column": 203}}}, "severity": "WARNING"}
- By default, when this parameter is not set it will be computed from the dimension of the vector as `Max(0.9, 1 - (1 / (1 + d)))`.

The lucene scalar quantization doesn't work when `data_type` is set to `byte` or any other data type except `float` (the default type).

Check failure on line 69 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Lucene' instead of 'lucene'. Raw Output: {"message": "[Vale.Terms] Use 'Lucene' instead of 'lucene'.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 69, "column": 5}}}, "severity": "ERROR"}
{: .warning}

The following example method definition specifies the Lucene sq encoder with `confidence_interval` as `1.0` which will consider all the input vectors for computing the min and max Quantiles and quantizes them (to `7` bits by default).
```json
PUT /test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2,
"method": {
"name": "hnsw",
"engine": "lucene",
"space_type": "l2",
"parameters": {
"encoder": {
"name": "sq",
"parameters": {
"confidence_interval": 1.0
}
},
"ef_construction": 256,
"m": 8
}
}
}
}
}
}
```
{% include copy-curl.html %}

There are no changes to ingestion or query mapping and no range limitations for the input vectors.

### Memory estimation

In the best-case scenario, 7-bit vectors produced by the Lucene scalar quantizer require 25% of the memory that 32-bit vectors require.

#### HNSW memory estimation

The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (dimension + 8 * M)` bytes/vector.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is M?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M is max number of connections. This isn't a new parameter.


As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:

```bash
1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.4 GB
```

## Faiss 16-bit scalar quantization

Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index.
Expand Down
Loading