-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Copeland Fusion for Hybrid Search #915
base: mainline
Are you sure you want to change the base?
Conversation
…n developer guide
Converting back to draft as I think I see some potential optimisations |
@@ -578,10 +578,10 @@ def _to_vespa_hybrid_query(self, marqo_query: MarqoHybridQuery) -> Dict[str, Any | |||
|
|||
query = {k: v for k, v in query.items() if v is not None} | |||
|
|||
if marqo_query.hybrid_parameters.rankingMethod in {RankingMethod.RRF}: # TODO: Add NormalizeLinear | |||
if marqo_query.hybrid_parameters.rankingMethod in [RankingMethod.RRF]: # TODO: Add NormalizeLinear | |||
query["marqo__hybrid.alpha"] = marqo_query.hybrid_parameters.alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If alpha and k aren't relevant for copeland, we need validation to catch this
) | ||
|
||
self.assertIn("hits", hybrid_res) | ||
self.assertEqual(hybrid_res["hits"][0]["_id"], "hippo text") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not checking the score to detect score regression?
int finalLength = Math.max(hitsTensor.size(), hitsLexical.size()); | ||
|
||
// Combine hits from both lists and update the raw score attributes | ||
Map<URI, Hit> combinedHitsMap = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need linked?
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
This PR adds a new fusion method for disjunct retrieval named "copeland". Taking influence from social choice theory, copeland-based fusion guarantees that a condorcet winner will be placed first if it exists. i.e. copeland-based fusion is condorcet.
Copeland based fusion was proposed at SIGIR2024 by Liron Tyomkin et al: https://dl.acm.org/doi/pdf/10.1145/3626772.3657912
What is the current behavior? (You can also link to an open issue here)
We don't have any condorcet fusion methods
What is the new behavior (if this is a feature change)?
We have a condorcet based fusion method, copeland-fusion.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No
Have unit tests been run against this PR? (Has there also been any additional testing?)
Java tests on the searcher but not full unit tests
Related Python client changes (link commit/PR here)
none
Related documentation changes (link commit/PR here)
Not done yet
Other information:
Please check if the PR fulfills these requirements