Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Add expand_nested_docs Parameter support to NMSLIB engine #2340

Closed
wants to merge 416 commits into from

Conversation

heemin32
Copy link
Collaborator

Backport d31149c from #2331

opensearch-trigger-bot bot and others added 30 commits November 27, 2023 10:04
#1324)

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit 5e2f899)

Co-authored-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit 2e3ab95)

Signed-off-by: Junqiu Lei <[email protected]>
…odels when nodes crash or leave cluster (#1348)

* Properly designate model state for actively training models when nodes crash or leave cluster

Signed-off-by: Ryan Bogan <[email protected]>

* Fix merge conflict

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
* Increase Lucene max dimension limit to 16,000

Signed-off-by: Junqiu Lei <[email protected]>
(cherry picked from commit 083ea2b)

Co-authored-by: Junqiu Lei <[email protected]>
…exing and search performance (#1353) (#1362)

Signed-off-by: Navneet Verma <[email protected]>
Changes how security tests are executed. Instead of setting up docker
container with security enabled, we now can directly spin up a gradle
local cluster with security which we can use to run tests against. To
enable this option, we just have to pass `-Dsecurity.enabled=true` as a
flag.

Along with this, some refactoring was done for the ODFERestTestCase for
configuring the client and cleaning up.

Signed-off-by: John Mazanec <[email protected]>
* Fix flaky tests

Signed-off-by: Ryan Bogan <[email protected]>

* Minor change

Signed-off-by: Ryan Bogan <[email protected]>

* Add necessary imports

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
Recently, we have seen that
TrainingJobRouteDecisionInfoTransportActionTests has been having
failures on Windows. The failures are related to an unintialized cluster
state. This does not have anything to do with the test itself. Most
likely, it is the result of state dependence that happens with
KNNSingleNodeTestCase.

This change refactors the class to use mocks and a lighter weight base
class, KNNTestCase.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit 5c24d99)
…) (#1377)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 271df52)

Co-authored-by: Navneet Verma <[email protected]>
* Add Lucene Codec 9.9

Signed-off-by: Naveen Tatikonda <[email protected]>

* Fix import statements for Lucene95 Codec

Signed-off-by: Naveen Tatikonda <[email protected]>

* Fix SegmentInfo Constructor in Test

Signed-off-by: Naveen Tatikonda <[email protected]>

* Temporarily Ignore Old Codec Tests

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Delete Old Codec Tests

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit 45e9e54)
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 8c98265)

Co-authored-by: Heemin Kim <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 6abec19)

Co-authored-by: Heemin Kim <[email protected]>
* apply boost

Signed-off-by: panguixin <[email protected]>

* add change log

Signed-off-by: panguixin <[email protected]>

---------

Signed-off-by: panguixin <[email protected]>
(cherry picked from commit fcbfef1)

Co-authored-by: panguixin <[email protected]>
…#1422)

(cherry picked from commit 47728ce)
Signed-off-by: Junqiu Lei <[email protected]>
Co-authored-by: Junqiu Lei <[email protected]>
…#1420)

* Remove default admin credentials

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 0c000ad)

Co-authored-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit d5538a4)

Co-authored-by: Ryan Bogan <[email protected]>
Refactors integration tests that directly access the model system index.
End users should not be directly accessing the model system index. It is
supposed to be an implementation detail. We have written restful
integration tests that directly access the model system index in order
to initialize the cluster state. However, we should not do this because
users should not be able to interact with it through restful APIs

That being said, some of this
implementation detail leaks out into the interface. For instance, in
k-NN stats we have a stat that is the model system index status. So, in
order to test this, we do need direct access to the system index.
Similarly, for search, we execute the search against the system index
and directly return the results. This is probably a bug - but we still
need to test it.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit 2b963b4)
* Fix flaky model tests in k-NN

Signed-off-by: Ryan Bogan <[email protected]>

* Remove * imports

Signed-off-by: Ryan Bogan <[email protected]>

* Minor change

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 9e4251e)

Co-authored-by: Ryan Bogan <[email protected]>
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.6 to 3.9.2.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.8.6...v3.9.2)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit fe592f5)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 4c7a055)

Co-authored-by: Heemin Kim <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit 8eb0776)

Co-authored-by: Naveen Tatikonda <[email protected]>
#1421) (#1448)

* Add Support for Faiss SQFP16 and enable Faiss AVX2 Optimization



* Add Patch Script to fix build on Linux CI



* Disable AVX2 support on Windows



* Add CHANGELOG



* Update Faiss Submodule



* Address Review Comments



* Update UX Interface



* Add Parameter to enable SIMD



* Update DEVELOPER_GUIDE



* Address Review Comments



---------

Signed-off-by: Naveen Tatikonda <[email protected]>
* Update spotless and eclipse dependencies

Signed-off-by: Ryan Bogan <[email protected]>

* Update dependencies for spotless and eclipse

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog

Signed-off-by: Ryan Bogan <[email protected]>

* Add comment

Signed-off-by: Ryan Bogan <[email protected]>

* Add resources force resolution for eclipse

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit fceb8f8)

Co-authored-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 48fcfa7)

Co-authored-by: Ryan Bogan <[email protected]>
opensearch-trigger-bot bot and others added 19 commits October 10, 2024 10:18
…nd perform exact search when there are no engine files (#2201)

* Add support to build vector data structures greedily and perform exact search when there are no engine files (#2188)

* Introduce new setting to configure when to build graph during segment creation (#2007)

Added new updatable index setting "build_vector_data_structure_threshold", which will be
considered when to build braph or not for native engines.
This is noop for lucene. This depends on use lucene format as prerequisite.
We don't need to add flag since it is only enable if lucene format is
already enabled.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add integration test for binary vector values (#2142)

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Allow build graph greedily for quantization scenarios (#2175)

Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add exact search if no native engine files are available (#2136)

* Add exact search if no engine files are in segments

When graph is not available, plugin will return empty results. With this change,
exact search will be performed when only no engine file is available in segment.
We also don't need version check or feature flag because, option to not build vector
data structure will only be available post 2.17.
If an index is created using pre 2.17 version, segment will always have engine files
and this feature will never be called during search.

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add support for radial search in exact search (#2174)

* Add support for radial search in exact search

When threshold value is set, knn plugin will not be creating graph.
Hence, when search request is trigged during that time, exact search
will return valid results. However, radial search was never included
as part of exact search. This will break radial search when threshold
is added and radial search is requested. In this commit, new method
is introduced to accept min score and return documents that are greater
than min score, similar to how radial search is performed by native
engines. This search is independent of engine, but, radial search is
supported only for FAISS engine out of all native engines.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 5a56829)

* Fix compilation issue due to package error

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Co-authored-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit 19162c2)

Co-authored-by: Naveen Tatikonda <[email protected]>
* Bump Faiss commit from 33c0ba5 to 4eecd91

Signed-off-by: Naveen Tatikonda <[email protected]>

* Update Faiss patches after commit bump

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit d9c7ba5)

Co-authored-by: Naveen Tatikonda <[email protected]>
Currently, for product quantization, we set the calculated compression
level to NOT_CONFIGURED. The main issue with this is that if a user sets
up a disk-based index with PQ, no re-scoring will happen by default.

This change adds the calculation so that the proper re-scoring will
happen. The formula is fairly straightforward =>
actual compression = (d * 32) / (m * code_size). Then, we round to the
neareste compression level (because we only support discrete compression
levels).

One small issue with this is that if PQ is configured to have
compression > 64x, the value will be 64x. Functionally, the only issue
will be that we may not be as aggressive on oversampling for on disk
mode.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit 228aead)

Co-authored-by: John Mazanec <[email protected]>
* Introduce a loading layer in NMSLIB. (#2185)

* Introduce a loading layer in NMSLIB.

Signed-off-by: Dooyong Kim <[email protected]>

* Added NMSLIB istream implementation.

Signed-off-by: Dooyong Kim <[email protected]>

* Fix integer overflow issue when passing read size for loading NMSLIB vector index.

Signed-off-by: Dooyong Kim <[email protected]>

* Added unit test for NMSLIB loading layer.

Signed-off-by: Dooyong Kim <[email protected]>

* Made a patch in NMSLIB to avoid frequently calling JNI for better loading index performance.

Signed-off-by: Dooyong Kim <[email protected]>

* Compliance constexpr function in C++11 having nullstatement.

Signed-off-by: Dooyong Kim <[email protected]>

---------

Signed-off-by: Dooyong Kim <[email protected]>
Co-authored-by: Dooyong Kim <[email protected]>

* Fixed that it's failing to resolve a package in import statement.

Signed-off-by: Dooyong Kim <[email protected]>

* Move the element in the changelog from 3.x to 2.x.

Signed-off-by: Dooyong Kim <[email protected]>

---------

Signed-off-by: Dooyong Kim <[email protected]>
Co-authored-by: Dooyong Kim <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit d52ee14)

Co-authored-by: Naveen Tatikonda <[email protected]>
Signed-off-by: Dooyong Kim <[email protected]>

(cherry picked from commit e5599aa)
* Add Release Notes for 2.18.0.0

Signed-off-by: Vikasht34 <[email protected]>

* Add Release Notes for 2.18.0.0

Signed-off-by: Vikasht34 <[email protected]>

---------

Signed-off-by: Vikasht34 <[email protected]>
(cherry picked from commit 8f2d911)

Co-authored-by: Vikasht34 <[email protected]>
…2231)

* Update approximate_threshold to 15K documents (#2229)

* Update threshold to 15K documents

After comparing indexing and search performance, we are updating
default value to be 15000.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Fix bwc test

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update test method

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Flush data after index

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Remove udpate cluster setting

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* update warmup rolling upgrade scenario

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
…2245)

Signed-off-by: Dooyong Kim <[email protected]>
Co-authored-by: Dooyong Kim <[email protected]>
(cherry picked from commit a029fa8)

Co-authored-by: Doo Yong Kim <[email protected]>
Signed-off-by: opensearch-ci-bot <[email protected]>
Co-authored-by: opensearch-ci-bot <[email protected]>
Signed-off-by: Kunal Kotwani <[email protected]>
(cherry picked from commit 3554ebf)

Co-authored-by: Kunal Kotwani <[email protected]>
…bsolute path in the filesystem. (#2248)

Signed-off-by: Dooyong Kim <[email protected]>
Co-authored-by: Dooyong Kim <[email protected]>
* Update default engine to FAISS

Since faiss supports more features than nmslib, and, we had seen
data points that there are more number of vector search
users are interesed in faiss, we will be updating default
engine to be faiss. This will benefit users who preffered
to use defaults while working with vector search.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update legacy mapping

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Create legacy mapping only up to V_2_17_2

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update test engine

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update test method

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 7d34456)

Co-authored-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Dooyong Kim <[email protected]>
Co-authored-by: Dooyong Kim <[email protected]>
(cherry picked from commit a07bad1)

Co-authored-by: Doo Yong Kim <[email protected]>
… update github ci runner for macos (#2279) (#2280)

* Upgrade bytebuddy and objenesis version to match OpenSearch core

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* update github runner to macos-13

macos-12 is deprecated, upgrading to macos 13

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Install libomp before running build

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 4992736)

Co-authored-by: Vijayan Balasubramanian <[email protected]>
…sed vector search (#2281) (#2282)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 2d1a408)

Co-authored-by: Navneet Verma <[email protected]>
…unning exact search for segments with no vector field (#2278) (#2285)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 7523cc3)

Co-authored-by: Navneet Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.