Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support retrieving matched elements in an array of strings #26636

Open
danitico opened this issue Mar 29, 2023 · 4 comments
Open

Support retrieving matched elements in an array of strings #26636

danitico opened this issue Mar 29, 2023 · 4 comments
Milestone

Comments

@danitico
Copy link

Is your feature request related to a problem? Please describe.
When we want to search in a array of strings (imagine a list of paragraphs) and setting the summary to dynamic, we receive the whole list of paragraphs where the matched ones have the <hi> tags to highlight the set of words queried. If this list of paragraphs is small , it won't be a problem to receive everything and then "clean" the json response.

However, if we have a huge list of paragraphs, the previous "solution" won't scale.

Describe the solution you'd like
We would like to define a new type of summary in which we would like to retrieve the paragraphs that matched with our query. Furthermore, it will be useful if we can combine this feature with the dynamic summary feature, so we can have the same features as in a string field.

Describe alternatives you've considered
The key of this feature is what vespa thinks when we are talking about a match. As @jobergum says, vespa has several ways to match. The current implementation is for arrays of struct and maps, where vespa looks for exact matches of values using the sameElement operator.

I would recommend to take into account the match property defined on that field to "erase" the unimportant paragraphs of the summary

Additional context

@jobergum
Copy link

Considering standard match:text for the array, is it enough that at least one of the query terms searching the array matches? E.g., query='what is the sun made of' would match many paragraphs as the query contains frequent words.

@danitico
Copy link
Author

@jobergum Yeah! That's it

@danitico danitico changed the title Support of retrieving matched elements in an array of strings Support retrieving matched elements in an array of strings Mar 29, 2023
@johans1 johans1 added this to the later milestone Apr 12, 2023
@Alexander-Mark
Copy link

I think an implementation of #29549 would solve this? We have the same issue (each doc has many paragraphs) and in streaming mode we sometimes experience latency upper bounded by disk IO, as well as additional overhead of transferring extra text over the network.

@andreer
Copy link
Member

andreer commented Jan 26, 2025

That would help, then the lower-scoring paragraphs could be removed from the result in a searcher to reduce the response size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants