Skip to content

Commit

Permalink
Merge branch 'main' into adding-edge-ngram-token-filter-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
kolchfa-aws authored Nov 14, 2024
2 parents cb9dbb5 + c4d59f2 commit 91692fb
Show file tree
Hide file tree
Showing 45 changed files with 1,674 additions and 24 deletions.
20 changes: 20 additions & 0 deletions .github/workflows/jekyll-spec-insert.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Lint and Test Jekyll Spec Insert
on:
push:
paths:
- 'spec-insert/**'
pull_request:
paths:
- 'spec-insert/**'
jobs:
lint-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with: { ruby-version: 3.3.0 }
- run: bundle install
- working-directory: spec-insert
run: |
bundle exec rubocop
bundle exec rspec
52 changes: 52 additions & 0 deletions .github/workflows/update-api-components.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Update API Components
on:
workflow_dispatch:
schedule:
- cron: "0 0 * * 0" # Every Sunday at midnight GMT
jobs:
update-api-components:
if: ${{ github.repository == 'opensearch-project/documentation-website' }}
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
fetch-depth: 0

- run: git config --global pull.rebase true

- uses: ruby/setup-ruby@v1
with: { ruby-version: 3.3.0 }

- run: bundle install

- name: Download spec and insert into documentation
run: bundle exec jekyll spec-insert

- name: Get current date
id: date
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_ENV

- name: GitHub App token
id: github_app_token
uses: tibdex/[email protected]
with:
app_id: ${{ secrets.APP_ID }}
private_key: ${{ secrets.APP_PRIVATE_KEY }}

- name: Create pull request
uses: peter-evans/create-pull-request@v6
with:
token: ${{ steps.github_app_token.outputs.token }}
commit-message: "Updated API components to reflect the latest OpenSearch API spec (${{ env.date }})"
title: "[AUTOCUT] Update API components to reflect the latest OpenSearch API spec (${{ env.date }})"
body: |
Update API components to reflect the latest [OpenSearch API spec](https://github.com/opensearch-project/opensearch-api-specification/releases/download/main-latest/opensearch-openapi.yaml).
Date: ${{ env.date }}
branch: update-api-components-${{ env.date }}
base: main
signoff: true
labels: autocut
135 changes: 135 additions & 0 deletions DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Developer guide
- [Introduction](#introduction)
- [Starting the Jekyll server locally](#starting-the-jekyll-server-locally)
- [Using the spec-insert Jekyll plugin](#using-the-spec-insert-jekyll-plugin)
- [Inserting query parameters](#inserting-query-parameters)
- [Inserting path parameters](#inserting-path-parameters)
- [Inserting paths and HTTP methods](#inserting-paths-and-http-methods)
- [Ignoring files and folders](#ignoring-files-and-folders)
- [CI/CD](#cicd)

## Introduction

The `.md` documents in this repository are rendered into HTML pages using [Jekyll](https://jekyllrb.com/). These HTML pages are hosted on [opensearch.org](https://opensearch.org/docs/latest/).

## Starting the Jekyll server locally
You can run the Jekyll server locally to view the rendered HTML pages using the following steps:

1. Install [Ruby](https://www.ruby-lang.org/en/documentation/installation/) 3.1.0 or later for your operating system.
2. Install the required gems by running `bundle install`.
3. Run `bundle exec jekyll serve` to start the Jekyll server locally (this can take several minutes to complete).
4. Open your browser and navigate to `http://localhost:4000` to view the rendered HTML pages.

## Using the `spec-insert` Jekyll plugin
The `spec-insert` Jekyll plugin is used to insert API components into Markdown files. The plugin downloads the [latest OpenSearch specification](https://github.com/opensearch-project/opensearch-api-specification) and renders the API components from the spec. This aims to reduce the manual effort required to keep the documentation up to date.

To use this plugin, make sure that you have installed Ruby 3.1.0 or later and the required gems by running `bundle install`.

Edit your Markdown file and insert the following snippet where you want render an API component:

```markdown
<!-- spec_insert_start
api: <API_NAME>
component: <COMPONENT_NAME>
other_param: <OTHER_PARAM>
-->

This is where the API component will be inserted.
Everything between the `spec_insert_start` and `spec_insert_end` tags will be overwritten.

<!-- spec_insert_end -->
```

Then run the following Jekyll command to render the API components:
```shell
bundle exec jekyll spec-insert
```

If you are working on multiple Markdown files and do not want to keep running the `jekyll spec-insert` command, you can add the `--watch` (or `-W`) flag to the command to watch for changes in the Markdown files and automatically render the API components:

```shell
bundle exec jekyll spec-insert --watch
```

Depending on the text editor you are using, you may need to manually reload the file from disk to see the changes applied by the plugin if the editor does not automatically reload the file periodically.

The plugin will pull the newest OpenSearch API spec from its [repository](https://github.com/opensearch-project/opensearch-api-specification) if the spec file does not exist locally or if it is older than 24 hours. To tell the plugin to always pull the newest spec, you can add the `--refresh-spec` (or `-R`) flag to the command:

```shell
bundle exec jekyll spec-insert --refresh-spec
```

### Inserting query parameters

To insert the API query parameters table, use the following snippet:

```markdown
<!-- spec_insert_start
api: cat.indices
component: query_parameters
-->
<!-- spec_insert_end -->
```

This will insert the query parameters of the `cat.indices` API into the `.md` file with three default columns: `Parameter`, `Type`, and `Description`. There are five columns that can be inserted: `Parameter`, `Type`, `Description`, `Required`, and `Default`. When `Required`/`Default` is not chosen, the information will be written in the `Description` column.

You can customize the query parameters table with the following columns:

- `Parameter`
- `Type`
- `Description`
- `Required`
- `Default`

You can also customize this component with the following settings:

- `include_global` (Boolean; default is `false`): Includes global query parameters in the table.
- `include_deprecated` (Boolean; default is `true`): Includes deprecated parameters in the table.
- `pretty` (Boolean; default is `false`): Renders the table in the pretty format instead of the compact format.

The following snippet inserts the specified columns into the query parameters table:

```markdown
<!-- spec_insert_start
api: cat.indices
component: query_parameters
include_global: true
include_deprecated: false
pretty: true
-->
<!-- spec_insert_end -->
```

### Inserting path parameters

To insert the `indices.create` API path parameters table, use the following snippet:

```markdown
<!-- spec_insert_start
api: indices.create
component: path_parameters
-->
<!-- spec_insert_end -->
```

This table behaves identically to the query parameters table except that it does not accept the `include_global` argument.

### Inserting paths and HTTP methods

To insert paths and HTTP methods for the `search` API, use the following snippet:

```markdown
<!-- spec_insert_start
api: search
component: paths_and_http_methods
-->
<!-- spec_insert_end -->
```

### Ignoring files and folders

The `spec-insert` plugin ignores all files and folders listed in the [./_config.yml#exclude](./_config.yml) list, which is also the list of files and folders that Jekyll ignores.

### CI/CD

The `spec-insert` plugin is run as part of the CI/CD pipeline to ensure that the API components are up to date in the documentation. This is performed through the [update-api-components.yml](.github/workflows/update-api-components.yml) GitHub Actions workflow, which creates a pull request containing the updated API components every Sunday.
43 changes: 29 additions & 14 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
source "http://rubygems.org"
# frozen_string_literal: true

source 'https://rubygems.org'

# Manually add csv gem since Ruby 3.4.0 no longer includes it
gem 'csv', '~> 3.0'

# Hello! This is where you manage which Jekyll version is used to run.
# When you want to use a different version, change it below, save the
Expand All @@ -8,12 +13,12 @@ source "http://rubygems.org"
#
# This will help ensure the proper Jekyll version is running.
# Happy Jekylling!
gem "jekyll", "~> 4.3.2"
gem 'jekyll', '~> 4.3.2'

# This is the default theme for new Jekyll sites. You may change this to anything you like.
gem "just-the-docs", "~> 0.3.3"
gem "jekyll-remote-theme", "~> 0.4"
gem "jekyll-redirect-from", "~> 0.16"
gem 'jekyll-redirect-from', '~> 0.16'
gem 'jekyll-remote-theme', '~> 0.4'
gem 'just-the-docs', '~> 0.3.3'

# If you want to use GitHub Pages, remove the "gem "jekyll"" above and
# uncomment the line below. To upgrade, run `bundle update github-pages`.
Expand All @@ -22,21 +27,31 @@ gem "jekyll-redirect-from", "~> 0.16"

# If you have any plugins, put them here!
group :jekyll_plugins do
gem "jekyll-last-modified-at"
gem "jekyll-sitemap"
gem 'jekyll-last-modified-at'
gem 'jekyll-sitemap'
gem 'jekyll-spec-insert', :path => './spec-insert'
end

# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]
gem 'tzinfo-data', platforms: %i[mingw mswin x64_mingw jruby]

# Performance-booster for watching directories on Windows
gem "wdm", "~> 0.1.0" if Gem.win_platform?
gem 'wdm', '~> 0.1.0' if Gem.win_platform?

# Installs webrick dependency for building locally
gem "webrick", "~> 1.7"

gem 'webrick', '~> 1.7'

# Link checker
gem "typhoeus"
gem "ruby-link-checker"
gem "ruby-enum"
gem 'ruby-enum'
gem 'ruby-link-checker'
gem 'typhoeus'

# Spec Insert
gem 'activesupport', '~> 7'
gem 'mustache', '~> 1'

group :development, :test do
gem 'rspec'
gem 'rubocop', '~> 1.44', require: false
gem 'rubocop-rake', require: false
end
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# About the OpenSearch documentation repo

The `documentation-website` repository contains the user documentation for OpenSearch. You can find the rendered documentation at [opensearch.org/docs](https://opensearch.org/docs).
The markdown files in this repository are rendered into HTML pages using [Jekyll](https://jekyllrb.com/). Check the [DEVELOPER_GUIDE](DEVELOPER_GUIDE.md) for more information about how to use Jekyll for this repository.


## Contributing
Expand Down
4 changes: 2 additions & 2 deletions _aggregations/bucket/nested.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,8 @@ GET logs/_search
"aggregations" : {
"pages" : {
"doc_count" : 2,
"min_price" : {
"value" : 200.0
"min_load_time" : {
"value" : 200
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion _analyzers/token-filters/condition.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: condition
title: Condition
parent: Token filters
nav_order: 70
---
Expand Down
2 changes: 1 addition & 1 deletion _analyzers/token-filters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Token filter | Underlying Lucene token filter| Description
`flatten_graph` | [FlattenGraphFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/core/FlattenGraphFilter.html) | Flattens a token graph produced by a graph token filter, such as `synonym_graph` or `word_delimiter_graph`, making the graph suitable for indexing.
`hunspell` | [HunspellStemFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/hunspell/HunspellStemFilter.html) | Uses [Hunspell](https://en.wikipedia.org/wiki/Hunspell) rules to stem tokens. Because Hunspell supports a word having multiple stems, this filter can emit multiple tokens for each consumed token. Requires you to configure one or more language-specific Hunspell dictionaries.
`hyphenation_decompounder` | [HyphenationCompoundWordTokenFilter](https://lucene.apache.org/core/9_8_0/analysis/common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilter.html) | Uses XML-based hyphenation patterns to find potential subwords in compound words and checks the subwords against the specified word list. The token output contains only the subwords found in the word list.
`keep_types` | [TypeTokenFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/core/TypeTokenFilter.html) | Keeps or removes tokens of a specific type.
[`keep_types`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/keep-types/) | [TypeTokenFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/core/TypeTokenFilter.html) | Keeps or removes tokens of a specific type.
`keep_word` | [KeepWordFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/KeepWordFilter.html) | Checks the tokens against the specified word list and keeps only those that are in the list.
`keyword_marker` | [KeywordMarkerFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/KeywordMarkerFilter.html) | Marks specified tokens as keywords, preventing them from being stemmed.
`keyword_repeat` | [KeywordRepeatFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html) | Emits each incoming token twice: once as a keyword and once as a non-keyword.
Expand Down
Loading

0 comments on commit 91692fb

Please sign in to comment.