Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding documentation for filter search in OpenSearch #7900

Merged
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9235dac
adding documentation for filter search in OpenSearch
leanneeliatra Aug 2, 2024
3db58b6
Update _search-plugins/filter-search.md
hdhalter Aug 2, 2024
8f35e20
Merge branch 'main' into adding-filter-search-results
leanneeliatra Aug 6, 2024
ab29e6a
reviewdog updates
leanneeliatra Aug 6, 2024
03e8674
Merge branch 'main' into adding-filter-search-results
leanneeliatra Aug 7, 2024
434aafb
Merge branch 'main' into adding-filter-search-results
leanneeliatra Aug 15, 2024
f7d4cd0
Merge branch 'main' into adding-filter-search-results
leanneeliatra Aug 20, 2024
10f9f8f
Merge branch 'main' into adding-filter-search-results
vagimeli Aug 28, 2024
c88e601
Merge branch 'main' into adding-filter-search-results
leanneeliatra Aug 29, 2024
c625110
Update filter-search.md
vagimeli Sep 3, 2024
59d3f09
Merge branch 'main' into adding-filter-search-results
vagimeli Sep 3, 2024
f952109
Merge branch 'main' into adding-filter-search-results
leanneeliatra Sep 10, 2024
5674a02
Merge branch 'main' into adding-filter-search-results
leanneeliatra Sep 23, 2024
f923669
Merge branch 'main' into adding-filter-search-results
vagimeli Sep 24, 2024
15b97e6
Merge branch 'main' into adding-filter-search-results
vagimeli Sep 24, 2024
71b640d
Merge branch 'main' into adding-filter-search-results
vagimeli Oct 8, 2024
1a23b50
Incorporating review comments.
leanneeliatra Oct 15, 2024
8aa29b6
Update _search-plugins/filter-search.md
vagimeli Oct 15, 2024
28aa7cd
Update _search-plugins/filter-search.md
vagimeli Oct 15, 2024
10f0181
Update filter-search.md
vagimeli Oct 15, 2024
bdf6042
Update _search-plugins/filter-search.md
vagimeli Oct 15, 2024
85e90ce
Merge branch 'main' into adding-filter-search-results
vagimeli Oct 15, 2024
3017860
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
7552312
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
24fddb0
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
1efb63b
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
c04ef8f
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
7f1f9b7
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
d9f1347
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
865da34
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
e37867a
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
c444b4a
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
77100a2
Update _search-plugins/filter-search.md
vagimeli Oct 16, 2024
05211a3
Merge branch 'main' into adding-filter-search-results
vagimeli Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
316 changes: 316 additions & 0 deletions _search-plugins/filter-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
---
layout: default
title: Filter search results
nav_order: 36
---

# Filter search results

In OpenSearch, filtering search results can be achieved through two main approaches: using a [DSL boolean query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html) with a filter clause. The boolean query filtering approach applies filters to both search hits and aggregations.

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 99}}}, "severity": "ERROR"}

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Boolean' instead of 'boolean'. Raw Output: {"message": "[Vale.Terms] Use 'Boolean' instead of 'boolean'.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 99}}}, "severity": "ERROR"}

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 231}}}, "severity": "ERROR"}

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Boolean' instead of 'boolean'. Raw Output: {"message": "[Vale.Terms] Use 'Boolean' instead of 'boolean'.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 231}}}, "severity": "ERROR"}
hdhalter marked this conversation as resolved.
Show resolved Hide resolved

You can also filter search results with the `post_filter` parameter in the search API, which applies filters only to search hits, not aggregations.

#### Table of contents
1. TOC
{:toc}

---

## Using `post_filter` to filter search results

Using the `post_filter` parameter to filter search results allows for calculating aggregations based on a broader result set before narrowing down the search hits. It can also improve relevance of results and reorder results by rescoring hits after applying the post filter.

Check failure on line 21 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 21, "column": 229}}}, "severity": "ERROR"}

### Example of filtering search results

1. Create an index of products

```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
PUT /electronics
{
"mappings": {
"properties": {
"brand": { "type": "keyword" },
"category": { "type": "keyword" },
"price": { "type": "float" },
"features": { "type": "keyword" }
}
}
}
```

2. Index data:

```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
PUT /electronics/_doc/1?refresh
{
"brand": "BrandX",
"category": "Smartphone",
"price": 699.99,
"features": ["5G", "Dual Camera"]
}

PUT /electronics/_doc/2?refresh
{
"brand": "BrandX",
"category": "Laptop",
"price": 1199.99,
"features": ["Touchscreen", "16GB RAM"]
}

PUT /electronics/_doc/3?refresh
{
"brand": "BrandY",
"category": "Smartphone",
"price": 799.99,
"features": ["5G", "Triple Camera"]
}
```

3. Perform a `boolean filter` to show only smartphones from BrandX

```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
GET /electronics/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "brand": "BrandX" }},
{ "term": { "category": "Smartphone" }}
]
}
}
}
```

Alternatively, to refine search results further, for example, you may have a category field that allows users to limit their search results to BrandX smartphones or tablets, you can utilize a `terms aggregation`:

```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
GET /electronics/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "brand": "BrandX" }},
{ "term": { "category": "Smartphone" }}
]
}
},
"aggs": {
"categories": {
"terms": { "field": "category" }
}
}
}
```
This returns the most popular categories of products from BrandX that are smartphones.

To display how many BrandX products are available in different price ranges, use a `post_filter`:

```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
GET /electronics/_search
{
"query": {
"bool": {
"filter": {
"term": { "brand": "BrandX" }
}
}
},
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 500 },
{ "from": 500, "to": 1000 },
{ "from": 1000 }
]
}
},
"category_smartphone": {
"filter": {
"term": { "category": "Smartphone" }
},
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 500 },
{ "from": 500, "to": 1000 },
{ "from": 1000 }
]
}
}
}
}
},
"post_filter": {
"term": { "category": "Smartphone" }
}
}

```
This query finds all products from BrandX. The `category_smartphone` aggregation limits the price range. The `price_ranges` aggregation returns price ranges for all BrandX products and the `post_filter` narrows the search hits to `smartphones`.

### Rescoring filtered search results

Check failure on line 156 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 156, "column": 5}}}, "severity": "ERROR"}
Rescoring is a tool to improve the accuracy of the returned search results. Rescoring focuses on the top results rather than applying the complex algorithm to the entire dataset, optimizing efficiency. Each shard processes the rescore request before the final results are aggregated and sorted by the coordinating node.

Check failure on line 157 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 157, "column": 1}}}, "severity": "ERROR"}

Check failure on line 157 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 157, "column": 77}}}, "severity": "ERROR"}

Example of using a rescore query:
```
GET /electronics/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "brand": "BrandX" }},
{ "term": { "category": "Smartphone" }}
]
}
},
"post_filter": {
"term": { "category": "Smartphone" }
},
"rescore": {
"window_size": 50,
"query": {
"rescore_query": {
"match": {
"features": "5G"
}
},
"query_weight": 1.0,
"rescore_query_weight": 2.0
}
}
}

```
In this example, the rescore section reorders the top 50 smartphones from BrandX based on whether their features include "5G".

When using pagination, avoid changing window_size with each page step, as it may cause shifting results, which could confuse users.

### Query rescorer

Check failure on line 193 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: rescorer. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: rescorer. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 193, "column": 11}}}, "severity": "ERROR"}

In OpenSearch, the query rescorer refines search results by applying an additional query to the top results obtained from the initial search. Instead of evaluating every document, the rescorer focuses only on a subset defined by the window_size parameter, which defaults to 10. This approach enhances the efficiency of relevance adjustments.

The rescore query’s influence is balanced with the original query through the `query_weight` and `rescore_query_weight` parameters, both set to 1 by default.

#### Query rescorer example

1. Create an index and add sample data:

```
PUT /articles
{
"mappings": {
"properties": {
"title": { "type": "text" },
"content": { "type": "text" },
"views": { "type": "integer" }
}
}
}
```

2. Add sample documents:

```
POST /articles/_doc/1
{
"title": "OpenSearch Basics",
"content": "Learn the basics of OpenSearch with this guide.",
"views": 150
}

POST /articles/_doc/2
{
"title": "Advanced OpenSearch Techniques",
"content": "Explore advanced features and techniques in OpenSearch.",
"views": 300
}

POST /articles/_doc/3
{
"title": "OpenSearch Performance Tuning",
"content": "Optimize the performance of your OpenSearch cluster.",
"views": 450
}

```

3. Perform a search with query rescorer:

This example query uses the query rescorer. It refines the results based on a phrase match for the content field. Documents that match "OpenSearch" in the content field are further rescored based on a phrase match, giving more weight to exact phrases.

```
POST /articles/_search
{
"query": {
"match": {
"content": "OpenSearch"
}
},
"rescore": {
"window_size": 10,
"query": {
"rescore_query": {
"match_phrase": {
"content": {
"query": "OpenSearch",
"slop": 2
}
}
},
"query_weight": 1,
"rescore_query_weight": 2
}
}
}
```
4. Perform a search with multiple rescorers:

In this example, we first apply a phrase match rescorer and then a function score rescorer to adjust the final relevance based on the number of views.
```
POST /articles/_search
{
"query": {
"match": {
"content": "OpenSearch"
}
},
"rescore": [
{
"window_size": 10,
"query": {
"rescore_query": {
"match_phrase": {
"content": {
"query": "OpenSearch",
"slop": 2
}
}
},
"query_weight": 0.7,
"rescore_query_weight": 1.5
}
},
{
"window_size": 5,
"query": {
"score_mode": "multiply",
"rescore_query": {
"function_score": {
"field_value_factor": {
"field": "views",
"factor": 1.2,
"missing": 1
}
}
}
}
}
]
}

```
Loading