Support retrieving matched elements in an array of strings #26636

danitico · 2023-03-29T15:50:00Z

Is your feature request related to a problem? Please describe.
When we want to search in a array of strings (imagine a list of paragraphs) and setting the summary to dynamic, we receive the whole list of paragraphs where the matched ones have the <hi> tags to highlight the set of words queried. If this list of paragraphs is small , it won't be a problem to receive everything and then "clean" the json response.

However, if we have a huge list of paragraphs, the previous "solution" won't scale.

Describe the solution you'd like
We would like to define a new type of summary in which we would like to retrieve the paragraphs that matched with our query. Furthermore, it will be useful if we can combine this feature with the dynamic summary feature, so we can have the same features as in a string field.

Describe alternatives you've considered
The key of this feature is what vespa thinks when we are talking about a match. As @jobergum says, vespa has several ways to match. The current implementation is for arrays of struct and maps, where vespa looks for exact matches of values using the sameElement operator.

I would recommend to take into account the match property defined on that field to "erase" the unimportant paragraphs of the summary

Additional context

Related to https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/
Check the slack thread here.
Registering a concrete type class breaks indexing of array<string> #23125

The text was updated successfully, but these errors were encountered:

jobergum · 2023-03-29T17:58:45Z

Considering standard match:text for the array, is it enough that at least one of the query terms searching the array matches? E.g., query='what is the sun made of' would match many paragraphs as the query contains frequent words.

danitico · 2023-03-29T20:05:44Z

@jobergum Yeah! That's it

Alexander-Mark · 2025-01-23T12:15:42Z

I think an implementation of #29549 would solve this? We have the same issue (each doc has many paragraphs) and in streaming mode we sometimes experience latency upper bounded by disk IO, as well as additional overhead of transferring extra text over the network.

andreer · 2025-01-26T18:18:13Z

That would help, then the lower-scoring paragraphs could be removed from the result in a searcher to reduce the response size.

danitico changed the title ~~Support of retrieving matched elements in an array of strings~~ Support retrieving matched elements in an array of strings Mar 29, 2023

jobergum added the enhancement label Mar 30, 2023

johans1 added this to the later milestone Apr 12, 2023

andreer mentioned this issue Jan 31, 2025

Add matched-elements-only support for index fields #30827

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support retrieving matched elements in an array of strings #26636

Support retrieving matched elements in an array of strings #26636

danitico commented Mar 29, 2023

jobergum commented Mar 29, 2023

danitico commented Mar 29, 2023

Alexander-Mark commented Jan 23, 2025

andreer commented Jan 26, 2025

Support retrieving matched elements in an array of strings #26636

Support retrieving matched elements in an array of strings #26636

Comments

danitico commented Mar 29, 2023

jobergum commented Mar 29, 2023

danitico commented Mar 29, 2023

Alexander-Mark commented Jan 23, 2025

andreer commented Jan 26, 2025