Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Knn Doc ingestion throws ClassCastException when used with OS plugin implementing IndexStorePlugin #1228

Closed
jitendra-titaniam opened this issue Oct 10, 2023 · 12 comments
Assignees
Labels
indexing-improvements This label should be attached to all the github issues which will help improving the indexing time. question Further information is requested storage-improvements v2.19.0

Comments

@jitendra-titaniam
Copy link

What is the bug?
Opensearch IndexStorePlugin allows plugin developers to provide custom implementation of org.apache.lucene.store.Directory
Plugin developer can write their own implementation of Directory.createOutput(String filename, IOContext context) and Directory.openInput(String filename, IOContext context) methods.
The current implementation in KNN80DocValuesConsumer.addKNNBinaryField gives ClassCastException if any other implementation of Directory is used instead of org.apache.lucene.store.FSDirectory.
Further more it does not use the Directory abstraction to createOutput rather writes to the IndexPath directly.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a Opensearch Plugin of type IndexStorePlugin. Implement getDirectoryFactories(). For index.store.type of titaniam this method returns an subclass of FsDirectoryFactory that creates custom implementation of Directory rather than an FSDirectory.
  2. Install the plugin and restart Opensearch
  3. Create an index
PUT housing-index1
{
  "settings": {
    "index.knn": true,
    "index.store.type": "titaniam"
  },
  "mappings": {
    "properties": {
      "housing-vector": {
        "type": "knn_vector",
        "dimension": 3
      },
      "title": {
        "type": "text"
      },
      "price": {
        "type": "long"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}
  1. Ingest a document
POST housing-index1/_doc
{
  "housing-vector": [
    10,
    20,
    30
  ],
  "title": "2 bedroom in downtown Seattle",
  "price": "2800",
  "location": "47.71, 122.00"
}

Following expection is shown in the opensearch.log

[2023-09-26T20:13:00,327][WARN ][o.o.i.e.Engine           ] [kanchipuram.local] [housing-index1][0] failed engine [refresh failed source[schedule]]
java.lang.ClassCastException: class com.titaniamlabs.lucene.store.TitaniamDirectory cannot be cast to class org.apache.lucene.store.FSDirectory (com.titaniamlabs.lucene.store.TitaniamDirectory is in unnamed module of loader java.net.FactoryURLClassLoader @d5ce97f; org.apache.lucene.store.FSDirectory is in unnamed module of loader 'app')
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addKNNBinaryField(KNN80DocValuesConsumer.java:131) ~[?:?]
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addBinaryField(KNN80DocValuesConsumer.java:78) ~[?:?]
	at org.apache.lucene.index.BinaryDocValuesWriter.flush(BinaryDocValuesWriter.java:132) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.IndexingChain.writeDocValues(IndexingChain.java:400) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.IndexingChain.flush(IndexingChain.java:258) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:392) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:493) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:570) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:73) ~[opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:53) ~[opensearch-2.7.0.jar:2.7.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:432) ~[opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:412) ~[opensearch-2.7.0.jar:2.7.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1846) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1825) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:4172) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.IndexService.maybeRefreshEngine(IndexService.java:983) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1116) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.7.0.jar:2.7.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

What is the expected behavior?
Knn DocValuesConsumer should use the lucene Directory abstraction to add Binary Field, so that OS plugins implementing IndexStorePlugin can work.

What is your host/environment?
Happening on all envs.

Do you have any screenshots?
NA

Do you have any additional context?

  1. Knn search on Elasticsearch 8.7.0 works with this plugin that implements IndexStorePlugin.

  2. Following is the failing line. It is casting to FSDirectory

indexPath = Paths.get(((FSDirectory) (FilterDirectory.unwrap(state.directory))).getDirectory().toString(), engineFileName)

Following codes then goes on to directly write to lucene file without going through any of the Directory abstractions.

indexCreator = () -> createKNNIndexFromTemplate(model.getModelBlob(), pair, knnEngine, indexPath);

and

These codes has to be resolved to write through Directory.

@jitendra-titaniam jitendra-titaniam added bug Something isn't working untriaged labels Oct 10, 2023
@vamshin
Copy link
Member

vamshin commented Oct 25, 2023

@jitendra-titaniam thanks for raising the issue. Would you be able to contribute the fix?

@jmazanec15
Copy link
Member

So this is tricky. The problem is that the way we access the "native" index files is by directly passing the file path to the native libraries (i.e. faiss and nmslib). This means we are not accessing the index via the IndexInput abstraction, which is returned from the directory. This becomes tricky when using things like remove stores.

Workarounds are kind of tough:

  1. First, we can ensure that the directory being used ends up being an FSDirectory. Im assuming any component implementing IndexStorePlugin will need something locally to search around. The basic idea would be to unwrap until we get there
  2. You could use lucene engine if you need something quicker.

Not sure what best route forward is for long-term solution. Implement native engines by reading from IndexInput would be very challenging if not impossible. We might be able to wrap this in more friendly abstraction that will work better with IndexStorePlugin though.

@vamshin vamshin added the question Further information is requested label Nov 2, 2023
@navneet1v navneet1v moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin moved this from Backlog (Hot) to Now(This Quarter) in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin moved this from Now(This Quarter) to Backlog (Hot) in Vector Search RoadMap Jul 2, 2024
@vamshin vamshin moved this from Backlog (Hot) to Backlog in Vector Search RoadMap Jul 2, 2024
@navneet1v
Copy link
Collaborator

navneet1v commented Aug 12, 2024

@jitendra-titaniam I have created this GH issue for using IndexInput for graph files. I am hoping it can solve the issue you faced: #2033

@github-project-automation github-project-automation bot moved this from Backlog to ✅ Done in Vector Search RoadMap Oct 16, 2024
@jmazanec15 jmazanec15 reopened this Oct 16, 2024
@jmazanec15 jmazanec15 moved this from ✅ Done to 2.18.0 in Vector Search RoadMap Oct 16, 2024
@jmazanec15
Copy link
Member

Will be completed in 2.18 once loading layer changes are completed.

@navneet1v
Copy link
Collaborator

@jitendra-titaniam this is the RFC for loading layer: #2033 please review and see if this can solve your issue. I believe it will

@dblock dblock removed the untriaged label Oct 21, 2024
@dblock
Copy link
Member

dblock commented Oct 21, 2024

[Catch All Triage - 1, 2]

@vamshin
Copy link
Member

vamshin commented Jan 16, 2025

@jitendra-titaniam looks like this change went in 2.18. Do you want to validate once?

@navneet1v
Copy link
Collaborator

@jitendra-titaniam
Adding on top of what @vamshin has said the change for the reading of vectors files went in 2.18 and writing of the vector file is in 2.19. :)

@navneet1v navneet1v removed the bug Something isn't working label Jan 16, 2025
@navneet1v navneet1v moved this from 2.18.0 to 2.19.0 in Vector Search RoadMap Jan 16, 2025
@jitendra-titaniam
Copy link
Author

We have validated with the Nov 14 nightly build of 2.19. And knn works with our IndexStorePlugin.
We will test once more with new nightly build and comment here.

@navneet1v
Copy link
Collaborator

We have validated with the Nov 14 nightly build of 2.19. And knn works with our IndexStorePlugin. We will test once more with new nightly build and comment here.

Thanks this is pretty exciting to know.

@navneet1v navneet1v added indexing-improvements This label should be attached to all the github issues which will help improving the indexing time. storage-improvements v2.19.0 labels Jan 22, 2025
@jitendra-titaniam
Copy link
Author

Tested with latest OS 2.19.0 nightly build (On Jan 22 2025). Knn works fine.
Thanks you all for getting this done.

@navneet1v
Copy link
Collaborator

Tested with latest OS 2.19.0 nightly build (On Jan 22 2025). Knn works fine. Thanks you all for getting this done.

Thanks for the confirmation. Resolving the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
indexing-improvements This label should be attached to all the github issues which will help improving the indexing time. question Further information is requested storage-improvements v2.19.0
Projects
Status: Done
Development

No branches or pull requests

6 participants