Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] KNN Efficient filter regression 2.16->2.17 and 2.18 on multiple indices #2428

Open
DagW opened this issue Jan 22, 2025 · 2 comments
Open
Labels
bug Something isn't working k-NN untriaged

Comments

@DagW
Copy link

DagW commented Jan 22, 2025

Describe the bug

When searching more than one index and targeting the _index field using KNN efficient filtering, sometimes nothing is returned, or only part of the result, and sometimes its returns more than filtered for.

It seems like we can target _index once, but quickly degrades.

This worked consistently in 2.16, but does not work consistently in 2.17 or 2.18.

Related component

Search:Query Capabilities

To Reproduce

PUT index_a
{
  "mappings": {
    "properties": {
      "filter_a":{"type":"keyword"},
      "filter_b":{"type":"keyword"},
      "_semantic": {
        "properties": {
          "vectorFaiss": {
            "type": "knn_vector",
            "dimension": 4,
            "method": {
              "engine": "faiss",
              "space_type": "l2",
              "name": "hnsw"
            }
          }
        }
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "knn": "true",
      "number_of_replicas": "0"
    }
  }
}

PUT index_b
{
  "mappings": {
    "properties": {
      "filter_a":{"type":"keyword"},
      "filter_b":{"type":"keyword"},
      "_semantic": {
        "properties": {
          "vectorFaiss": {
            "type": "knn_vector",
            "dimension": 4,
            "method": {
              "engine": "faiss",
              "space_type": "l2",
              "name": "hnsw"
            }
          }
        }
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "knn": "true",
      "number_of_replicas": "0"
    }
  }
}

POST _bulk
{"index":{"_index":"index_a","_id":"1"}}
{"_semantic.vectorFaiss":[-0.01424408,-0.08703613,0.07312012,-0.019836426],"parking":"true","filter_a":"1", "filter_b": "2"}
{"index":{"_index":"index_a","_id":"2"}}
{"_semantic.vectorFaiss":[-0.01424408,-0.00703613,0.07312012,-0.019836426],"parking":"true","filter_a":"2", "filter_b": "3"}
{"index":{"_index":"index_a","_id":"3"}}
{"_semantic.vectorFaiss":[-0.01424408,-0.8703613,0.07312012,-0.019836426],"parking":"true","filter_a":"4", "filter_b": "5"}
{"index":{"_index":"index_b","_id":"1"}}
{"_semantic.vectorFaiss":[-0.01424408,-0.08703613,0.07312012,-0.019836426],"parking":"true","filter_a":"1", "filter_b": "2"}
{"index":{"_index":"index_b","_id":"2"}}
{"_semantic.vectorFaiss":[-0.01424408,-0.00703613,0.07312012,-0.019836426],"parking":"true","filter_a":"2", "filter_b": "3"}
{"index":{"_index":"index_b","_id":"3"}}
{"_semantic.vectorFaiss":[-0.01424408,-0.8703613,0.07312012,-0.019836426],"parking":"true","filter_a":"4", "filter_b": "5"}


GET /index_a,index_b/_search
{
  "query": {
    "knn": {
      "_semantic.vectorFaiss": {
        "k": 3,
        "filter": {
          "bool": {
            "should": [
              {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "filter_a": "1"
                      }
                    },
                    {
                      "term": {
                        "_index": "index_a"
                      }
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "filter_b": "2"
                      }
                    },
                    {
                      "term": {
                        "_index": "index_c"
                      }
                    }
                  ]
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        "vector": [
          0.0073,
          -0.01424408,
          -0.08703613,
          0.07312012
        ]
      }
    }
  }
}

Executing the search yields 3 different results, roughly at same chance:

  • Incorrect result (Receives a result from _index="index_b", although not in filter)
  • No result (Should receive one hit)
  • Correct result

Example incorrect result

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.9614888,
    "hits": [
      {
        "_index": "index_a",
        "_id": "1",
        "_score": 0.9614888,
        "_source": {
          "_semantic.vectorFaiss": [
            -0.01424408,
            -0.08703613,
            0.07312012,
            -0.019836426
          ],
          "parking": "true",
          "filter_a": "1",
          "filter_b": "2"
        }
      },
      {
        "_index": "index_b", <---- index_b does not match the filter
        "_id": "1",
        "_score": 0.9614888,
        "_source": {
          "_semantic.vectorFaiss": [
            -0.01424408,
            -0.08703613,
            0.07312012,
            -0.019836426
          ],
          "parking": "true",
          "filter_a": "1",
          "filter_b": "2"
        }
      }
    ]
  }
}

No result

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

Correct result

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.9614888,
    "hits": [
      {
        "_index": "index_a",
        "_id": "1",
        "_score": 0.9614888,
        "_source": {
          "_semantic.vectorFaiss": [
            -0.01424408,
            -0.08703613,
            0.07312012,
            -0.019836426
          ],
          "parking": "true",
          "filter_a": "1",
          "filter_b": "2"
        }
      }
    ]
  }
}

Expected behavior

Correct result

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.9614888,
    "hits": [
      {
        "_index": "index_a",
        "_id": "1",
        "_score": 0.9614888,
        "_source": {
          "_semantic.vectorFaiss": [
            -0.01424408,
            -0.08703613,
            0.07312012,
            -0.019836426
          ],
          "parking": "true",
          "filter_a": "1",
          "filter_b": "2"
        }
      }
    ]
  }
}

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS] Kubernetes, running Opensearch Helm charts, with the below image versions.
  • Version [e.g. 22] - 2.17 and 2.18

Additional context
Add any other context about the problem here.

@DagW DagW added bug Something isn't working untriaged labels Jan 22, 2025
@kotwanikunal
Copy link
Member

@opensearch-project/admin Can you please transfer this to the kNN repository?

@gaiksaya gaiksaya transferred this issue from opensearch-project/OpenSearch Jan 23, 2025
@navneet1v
Copy link
Collaborator

@DagW this is pretty weird. Because as per my knowledge there has been no change in the filter logic between 2.16 and 2.17,2.18 . This would require some deep-dive to understand what is happening.

cc: @vamshin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working k-NN untriaged
Projects
None yet
Development

No branches or pull requests

4 participants