From fabcb3b66b902ae15d37540b1c9343132135a8d6 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Fri, 22 Dec 2023 13:32:52 -0700
Subject: [PATCH 01/12] Add html strip processor documentation

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 86 ++++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 _ingest-pipelines/processors/html-strip.md
diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
new file mode 100644
index 0000000000..d33533714d
--- /dev/null
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -0,0 +1,86 @@
+---
+layout: default
+title: HTML strip
+parent: Ingest processors
+nav_order: 140
+---
+
+# JSON processor
+
+The `html_strip` processor is used to <explain what is used to do>.
+
+The following is the syntax for the `html_strip` processor:
+
+```json
+<insert syntax example>
+```
+{% include copy-curl.html %}
+
+## Configuration parameters
+
+The following table lists the required and optional parameters for the `html_strip` processor.
+
+Parameter | Required/Optional | Description |
+|-----------|-----------|-----------|
+<insert the parameters>
+
+## Using the processor
+
+Follow these steps to use the processor in a pipeline.
+
+### Step 1: Create a pipeline
+
+The following query creates a pipeline, named <name>, that uses the `html_strip` processor to <do what?>: 
+
+```json
+<insert pipeline code example>
+```
+{% include copy-curl.html %}
+
+### Step 2 (Optional): Test the pipeline
+
+It is recommended that you test your pipeline before you ingest documents.
+{: .tip}
+
+To test the pipeline, run the following query:
+
+```json
+<insert code example>
+```
+{% include copy-curl.html %}
+
+#### Response
+
+The following example response confirms that the pipeline is working as expected:
+
+```json
+<insert response example>
+```
+
+### Step 3: Ingest a document 
+
+The following query ingests a document into an index named `testindex1`:
+
+```json
+<insert code example>
+```
+{% include copy-curl.html %}
+
+#### Response
+
+The request indexes the document into the index <index name> and will index all documents with <what does this response tell the user?>.
+
+```json
+<insert code example>
+```
+
+### Step 4 (Optional): Retrieve the document
+
+To retrieve the document, run the following query:
+
+```json
+<insert code example>
+```
+{% include copy-curl.html %}
+
+<Provide any other information and code examples relevant to the user or use cases.>
\ No newline at end of file

From 41a2bdf30b6cf15106707a2492085e0706dc6fac Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Fri, 22 Dec 2023 13:38:17 -0700
Subject: [PATCH 02/12] Add html strip processor documentation

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index d33533714d..d164bb5a83 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -5,7 +5,7 @@ parent: Ingest processors
 nav_order: 140
 ---
 
-# JSON processor
+# HTML strip processor
 
 The `html_strip` processor is used to <explain what is used to do>.
 

From de535f7b418cedae3dfb48f456661a93db702baa Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Wed, 22 May 2024 15:52:15 -0600
Subject: [PATCH 03/12] Add examples

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 103 ++++++++++++++++++---
 1 file changed, 91 insertions(+), 12 deletions(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index d164bb5a83..3279c58dea 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -7,7 +7,7 @@ nav_order: 140
 
 # HTML strip processor
 
-The `html_strip` processor is used to <explain what is used to do>.
+The `html_strip` processor removes HTML tags from string fields in incoming documents. The processor is useful when indexing data from web pages or other sources that may contain HTML markup. By removing the HTML tags, you can ensure that the indexed content is clean and easily searchable. HTML tags are replaced with newline characters (`\n`).
 
 The following is the syntax for the `html_strip` processor:
 
@@ -22,7 +22,14 @@ The following table lists the required and optional parameters for the `html_str
 
 Parameter | Required/Optional | Description |
 |-----------|-----------|-----------|
-<insert the parameters>
+`field` | Required | The string field from which to remove HTML tags.
+`target_field` | Optional | The field to assign the cleaned value to. If not specified, field is updated in-place.
+`ignore_missing` | Optional | Default is `false`. If `true`, the processor quietly exits without modifying the document when field does not exist.
+`description` | Optional | Description of the processor's purpose or configuration.
+`if` | Optional | Conditionally execute the processor.
+`ignore_failure` | Optional | Ignore failures for the processor. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
+`on_failure` | Optional | Handle failures for the processor. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
+`tag` | Optional | Identifier for the processor. Useful for debugging and metrics.
 
 ## Using the processor
 
@@ -30,10 +37,21 @@ Follow these steps to use the processor in a pipeline.
 
 ### Step 1: Create a pipeline
 
-The following query creates a pipeline, named <name>, that uses the `html_strip` processor to <do what?>: 
+The following query creates a pipeline named `strip-html-pipeline` that uses the `html_strip` processor to remove HTML tags from the description field and store the processed value in a new field named `cleaned_description`:
 
 ```json
-<insert pipeline code example>
+PUT _ingest/pipeline/strip-html-pipeline
+{
+  "description": "A pipeline to strip HTML from description field",
+  "processors": [
+    {
+      "html_strip": {
+        "field": "description",
+        "target_field": "cleaned_description"
+      }
+    }
+  ]
+}
 ```
 {% include copy-curl.html %}
 
@@ -45,7 +63,16 @@ It is recommended that you test your pipeline before you ingest documents.
 To test the pipeline, run the following query:
 
 ```json
-<insert code example>
+POST _ingest/pipeline/strip-html-pipeline/_simulate
+{
+  "docs": [
+    {
+      "_source": {
+        "description": "This is a <b>test</b> description with <i>some</i> HTML tags."
+      }
+    }
+  ]
+}
 ```
 {% include copy-curl.html %}
 
@@ -54,33 +81,85 @@ To test the pipeline, run the following query:
 The following example response confirms that the pipeline is working as expected:
 
 ```json
-<insert response example>
+{
+  "docs": [
+    {
+      "doc": {
+        "_index": "_index",
+        "_id": "_id",
+        "_source": {
+          "description": "This is a <b>test</b> description with <i>some</i> HTML tags.",
+          "cleaned_description": "This is a test description with some HTML tags."
+        },
+        "_ingest": {
+          "timestamp": "2024-05-22T21:46:11.227974965Z"
+        }
+      }
+    }
+  ]
+}
 ```
+{% include copy-curl.html %}
 
 ### Step 3: Ingest a document 
 
-The following query ingests a document into an index named `testindex1`:
+The following query ingests a document into an index named `products`:
 
 ```json
-<insert code example>
+PUT products/_doc/1?pipeline=strip-html-pipeline
+{
+  "name": "Product 1",
+  "description": "This is a <b>test</b> product with <i>some</i> HTML tags."
+}
 ```
 {% include copy-curl.html %}
 
 #### Response
 
-The request indexes the document into the index <index name> and will index all documents with <what does this response tell the user?>.
+The response shows that the request has indexed the document into the index `products` and will index all documents with the `description` field containing HTML tags, while storing the cleaned version in the `cleaned_description` field.
 
 ```json
-<insert code example>
+{
+  "_index": "products",
+  "_id": "1",
+  "_version": 1,
+  "result": "created",
+  "_shards": {
+    "total": 2,
+    "successful": 1,
+    "failed": 0
+  },
+  "_seq_no": 0,
+  "_primary_term": 1
+}
 ```
+{% include copy-curl.html %}
 
 ### Step 4 (Optional): Retrieve the document
 
 To retrieve the document, run the following query:
 
 ```json
-<insert code example>
+GET products/_doc/1
 ```
 {% include copy-curl.html %}
 
-<Provide any other information and code examples relevant to the user or use cases.>
\ No newline at end of file
+#### Response
+
+The response includes both the original `description` field and the `cleaned_description` field with HTML tags removed.
+
+```json
+{
+  "_index": "products",
+  "_id": "1",
+  "_version": 1,
+  "_seq_no": 0,
+  "_primary_term": 1,
+  "found": true,
+  "_source": {
+    "cleaned_description": "This is a test product with some HTML tags.",
+    "name": "Product 1",
+    "description": "This is a <b>test</b> product with <i>some</i> HTML tags."
+  }
+}
+```
\ No newline at end of file

From c76e12fa5cedbd8cc2d7101c7de8e80b9b40ca8a Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Wed, 5 Jun 2024 14:48:01 -0600
Subject: [PATCH 04/12] Copy edits

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index 3279c58dea..7a9c5665dc 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -12,7 +12,11 @@ The `html_strip` processor removes HTML tags from string fields in incoming docu
 The following is the syntax for the `html_strip` processor:
 
 ```json
-<insert syntax example>
+{  
+  "html_strip": {  
+    "field": "webpage"  
+  }  
+}  
 ```
 {% include copy-curl.html %}
 
@@ -24,12 +28,12 @@ Parameter | Required/Optional | Description |
 |-----------|-----------|-----------|
 `field` | Required | The string field from which to remove HTML tags.
 `target_field` | Optional | The field to assign the cleaned value to. If not specified, field is updated in-place.
-`ignore_missing` | Optional | Default is `false`. If `true`, the processor quietly exits without modifying the document when field does not exist.
-`description` | Optional | Description of the processor's purpose or configuration.
-`if` | Optional | Conditionally execute the processor.
-`ignore_failure` | Optional | Ignore failures for the processor. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
-`on_failure` | Optional | Handle failures for the processor. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
-`tag` | Optional | Identifier for the processor. Useful for debugging and metrics.
+`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. Default is `false`.
+`description` | Optional | A description of the processor's purpose or configuration.
+`if` | Optional | Specifies to conditionally execute the processor.
+`ignore_failure` | Optional | Specifies to ignore processor failures. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
+`on_failure` | Optional | Specifies a list of processors to run if the processor fails during execution. These processors are executed in the order they are specified. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
+`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type.
 
 ## Using the processor
 

From 5803bd33fc4c8166681f206dd2859ef05cc3e3c3 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 08:45:20 -0600
Subject: [PATCH 05/12] Update _ingest-pipelines/processors/html-strip.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index 7a9c5665dc..b0b6d03f72 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -7,7 +7,7 @@ nav_order: 140
 
 # HTML strip processor
 
-The `html_strip` processor removes HTML tags from string fields in incoming documents. The processor is useful when indexing data from web pages or other sources that may contain HTML markup. By removing the HTML tags, you can ensure that the indexed content is clean and easily searchable. HTML tags are replaced with newline characters (`\n`).
+The `html_strip` processor removes HTML tags from string fields in incoming documents. This processor is useful when indexing data from webpages or other sources that may contain HTML markup. By removing the HTML tags, you can ensure that the indexed content is clean and easily searchable. HTML tags are replaced with newline characters (`\n`).
 
 The following is the syntax for the `html_strip` processor:
 

From 764c597d79be537bb493a7e49244bcb1ccf9ce64 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 09:15:54 -0600
Subject: [PATCH 06/12] Update _ingest-pipelines/processors/html-strip.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index b0b6d03f72..13bb085c04 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `html_str
 Parameter | Required/Optional | Description |
 |-----------|-----------|-----------|
 `field` | Required | The string field from which to remove HTML tags.
-`target_field` | Optional | The field to assign the cleaned value to. If not specified, field is updated in-place.
+`target_field` | Optional | The field to assign the cleaned value to. If not specified, then the field is updated in-place.
 `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. Default is `false`.
 `description` | Optional | A description of the processor's purpose or configuration.
 `if` | Optional | Specifies to conditionally execute the processor.

From 0d6356aeb84f4a40ac32c87cd1ec24bd23557d37 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 09:16:04 -0600
Subject: [PATCH 07/12] Update _ingest-pipelines/processors/html-strip.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index 13bb085c04..4e14858339 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -120,7 +120,7 @@ PUT products/_doc/1?pipeline=strip-html-pipeline
 
 #### Response
 
-The response shows that the request has indexed the document into the index `products` and will index all documents with the `description` field containing HTML tags, while storing the cleaned version in the `cleaned_description` field.
+The response shows that the request has indexed the document into the index `products` and will index all documents with the `description` field containing HTML tags while storing the clean version in the `cleaned_description` field:
 
 ```json
 {

From 4d06d4db36a354c685deb3917615eef7dde5ca4a Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 09:16:11 -0600
Subject: [PATCH 08/12] Update _ingest-pipelines/processors/html-strip.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index 4e14858339..c21e432a56 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -150,7 +150,7 @@ GET products/_doc/1
 
 #### Response
 
-The response includes both the original `description` field and the `cleaned_description` field with HTML tags removed.
+The response includes both the original `description` field and the `cleaned_description` field with HTML tags removed:
 
 ```json
 {

From a2ab21e5569aa5d8fd58b5c65b2f54e97ed2fb96 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 09:16:39 -0600
Subject: [PATCH 09/12] Update _ingest-pipelines/processors/html-strip.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index c21e432a56..18f03f7161 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -7,7 +7,7 @@ nav_order: 140
 
 # HTML strip processor
 
-The `html_strip` processor removes HTML tags from string fields in incoming documents. This processor is useful when indexing data from webpages or other sources that may contain HTML markup. By removing the HTML tags, you can ensure that the indexed content is clean and easily searchable. HTML tags are replaced with newline characters (`\n`).
+The `html_strip` processor removes HTML tags from string fields in incoming documents. This processor is useful when indexing data from webpages or other sources that may contain HTML markup. HTML tags are replaced with newline characters (`\n`).
 
 The following is the syntax for the `html_strip` processor:
 

From 2f7f17c29a7096ead5e81eb44bbb3128d1434b11 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 09:19:39 -0600
Subject: [PATCH 10/12] Update _ingest-pipelines/processors/html-strip.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index 18f03f7161..c6cbc2cd1c 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -120,7 +120,7 @@ PUT products/_doc/1?pipeline=strip-html-pipeline
 
 #### Response
 
-The response shows that the request has indexed the document into the index `products` and will index all documents with the `description` field containing HTML tags while storing the clean version in the `cleaned_description` field:
+The response shows that the request has indexed the document into the index `products` and will index all documents with the `description` field containing HTML tags while storing the plain text version in the `cleaned_description` field:
 
 ```json
 {

From f5a3d62b1e87fd3a153209500d6450c735082466 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 09:27:59 -0600
Subject: [PATCH 11/12] Update _ingest-pipelines/processors/html-strip.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index c6cbc2cd1c..6b814e7fe3 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `html_str
 Parameter | Required/Optional | Description |
 |-----------|-----------|-----------|
 `field` | Required | The string field from which to remove HTML tags.
-`target_field` | Optional | The field to assign the cleaned value to. If not specified, then the field is updated in-place.
+`target_field` | Optional | The field to receive the plain text version after stripping HTML tags. If not specified, then the field is updated in-place.
 `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. Default is `false`.
 `description` | Optional | A description of the processor's purpose or configuration.
 `if` | Optional | Specifies to conditionally execute the processor.

From 8fd346de9b4a26308dea8a42dfc95bbb4fa582a5 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 6 Jun 2024 09:29:58 -0600
Subject: [PATCH 12/12] Update _ingest-pipelines/processors/html-strip.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
---
 _ingest-pipelines/processors/html-strip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_ingest-pipelines/processors/html-strip.md b/_ingest-pipelines/processors/html-strip.md
index 6b814e7fe3..ac33c45eae 100644
--- a/_ingest-pipelines/processors/html-strip.md
+++ b/_ingest-pipelines/processors/html-strip.md
@@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `html_str
 Parameter | Required/Optional | Description |
 |-----------|-----------|-----------|
 `field` | Required | The string field from which to remove HTML tags.
-`target_field` | Optional | The field to receive the plain text version after stripping HTML tags. If not specified, then the field is updated in-place.
+`target_field` | Optional | The field that receives the plain text version after stripping HTML tags. If not specified, then the field is updated in-place.
 `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. Default is `false`.
 `description` | Optional | A description of the processor's purpose or configuration.
 `if` | Optional | Specifies to conditionally execute the processor.