diff --git a/www/docs/learn/metadata-search-filtering/ootb-metadata-filters.md b/www/docs/learn/metadata-search-filtering/ootb-metadata-filters.md index 490b32b8e..dfdca0fa5 100644 --- a/www/docs/learn/metadata-search-filtering/ootb-metadata-filters.md +++ b/www/docs/learn/metadata-search-filtering/ootb-metadata-filters.md @@ -5,8 +5,9 @@ title: Default Metadata Filters import {Config} from '@site/docs/definitions.md'; -A few pieces of metadata expressions filterable out of the box, as they're very -useful in a variety of situations. +A few pieces of metadata expressions are filterable out of the box, including +Document ID, Language, and Titles. These filters are very useful in a variety +of situations. Note that you can set up additional fields to filter on by setting up [filter attributes](/docs/api-reference/admin-apis/create-corpus#filter-attribute) on a @@ -14,8 +15,9 @@ corpus. ## `doc.id` field + Each document is assigneed a unique identifier at indexing. You can use the -`doc.id` field to retrieve or filter specific documents in your corpus. +`doc.id` field to retrieve or filter specific Document IDs in your corpus. Valid filter expressions include something like: @@ -25,6 +27,7 @@ Valid filter expressions include something like: ## `part.lang` field + Each section of a document is evaluated for its language at index time and the `part.lang` field is added with a 3-character lower-case language code ([ISO 639-2](https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes)). For @@ -37,12 +40,32 @@ Valid filter expressions for this would be something like: * `part.lang = 'eng' OR part.lang = 'deu'` ## `part.is_title` field -When adding content, will add a special Boolean -field to indicate whether the field is a title field or not. This is useful + +When adding content, adds a special Boolean +field to indicate whether the field is a `title` field or not. This is useful for a few different cases depending on how you model your data. For example, -some users want to only match on a title field or never match on a title field, +some users want to **only** match on a title field, or never match on a title field, in which case this field can be used to filter. -To filter for title fields only, you can use: `part.is_title = true` and -conversely `part.is_title = false` will return only non-title sections. +This field actually uses three value logic: true, false, and unset. We +designed it like this to avoid creating too much metadata because customers +are billed for metadata, so it is in the customer's interest. Here is +how it works using "neural networks" and an example document: + +* Title: "Neural Networks and Deep Learning" +* Section 1: "Introduction to Neural Networks" +* Section 2: "Applications of Neural Networks in AI" +* Section 3: "Conclusion" + +* To filter for only title fields, use `part.is_title = true`. You get results + with "neural networks" in the title, such as "Neural Networks and Deep + Learning" in the title. +* To return only non-title sections, use `part.is_title = false`. You get + results for sections that contain "neural networks" but are not titles, such + as "Introduction to Neural Networks," "Applications of Neural Networks in + AI," and "Conclusion. You **do not** get titles with that term in the + results. +* However, not all documents have titles. To include sections with no title set, + use `part.is_title <> true`. You could get a variety of results that do not + have specific title designations but they contain the term "neural networks".