From e3576fba3eed65b9fa1c635fba591723542bddb5 Mon Sep 17 00:00:00 2001 From: Kunal Kotwani Date: Tue, 3 Sep 2024 07:21:49 -0700 Subject: [PATCH 1/2] Update known limitations for kNN based indexes (#8137) * Update known limitations for kNN based indexes Signed-off-by: Kunal Kotwani * Update _tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Kunal Kotwani Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index 4af25004a7..b9e35b2697 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -108,4 +108,5 @@ The following are known limitations of the searchable snapshots feature: - Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. - Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications. - For better search performance, consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes into a smaller number of segments before taking a snapshot. For the best performance, at the cost of using compute resources prior to snapshotting, force merge your index into one segment. -- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. For more details on the maximum ratio of remote data, see issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676). +- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. For more details on the maximum ratio of remote data, see issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676). +- k-NN native-engine-based indexes using `faiss` and `nmslib` engines are incompatible with searchable snapshots. From 9e7aedc3d11d52fec60513300786c6d2f9ab97a9 Mon Sep 17 00:00:00 2001 From: kkewwei Date: Tue, 3 Sep 2024 22:25:13 +0800 Subject: [PATCH 2/2] Update binary.md (#8142) According the code, the default value of `hasDocValues` is false https://github.com/opensearch-project/OpenSearch/blob/03d9a249e47b99b33c6de3625f43b12bef29c1cb/server/src/main/java/org/opensearch/index/mapper/BinaryFieldMapper.java#L85 Signed-off-by: kkewwei Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _field-types/supported-field-types/binary.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_field-types/supported-field-types/binary.md b/_field-types/supported-field-types/binary.md index d6974ad4cf..99d468c1dc 100644 --- a/_field-types/supported-field-types/binary.md +++ b/_field-types/supported-field-types/binary.md @@ -50,5 +50,5 @@ The following table lists the parameters accepted by binary field types. All par Parameter | Description :--- | :--- -`doc_values` | A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting. Optional. Default is `true`. -`store` | A Boolean value that specifies whether the field value should be stored and can be retrieved separately from the _source field. Optional. Default is `false`. \ No newline at end of file +`doc_values` | A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting. Optional. Default is `false`. +`store` | A Boolean value that specifies whether the field value should be stored and can be retrieved separately from the _source field. Optional. Default is `false`.