Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated docs #588

Merged
merged 5 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions docs/en/aws-deploy-elastic-serverless-forwarder.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -276,15 +276,20 @@ These parameters define the permissions required in order to access the associat

[discrete]
=== Network
These parameters define the network settings for your environment.
The way to attach the Elastic Serverless Forwarder to a specific {aws} VPC is by defining both the security groups IDs and subnet IDs belonging to the {aws} VPC. The limit is related to the https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-lambda-function-vpcconfig.html[CloudFormation VPCConfig property]

- `ElasticServerlessForwarderSecurityGroups`: Add a comma delimited list of security group IDs to attach to the forwarder. Along with `ElasticServerlessForwarderSubnets`, these settings will define the {aws} VPC the forwarder will belong to. Leave blank if you don't want the forwarder to belong to any specific {aws} VPC.
- `ElasticServerlessForwarderSubnets`: Add a comma delimited list of subnet IDs for to the forwarder. Along with `ElasticServerlessForwarderSecurityGroups`, these settings will define the {aws} VPC the forwarder will belong to. Leave blank if you don't want the forwarder to belong to any specific {aws} VPC.
These are the parameters:

[NOTE]
====
If you are setting up an an {aws} VPC for the forwarder, review the <<aws-serverless-troubleshooting-vpc-prerequisites,VPC prerequisites>>.
====
- `ElasticServerlessForwarderSecurityGroups`: Add a comma delimited list of security group IDs to attach to the forwarder.
- `ElasticServerlessForwarderSubnets`: Add a comma delimited list of subnet IDs for to the forwarder.

Both parameters are required in order to attach the Elastic Serverless Forwarder to a specific {aws} VPC.
Leave both parameters blank if you don't want the forwarder to belong to any specific {aws} VPC.


If the Elastic Serverless Forwarder is attached to a VPC, you need to https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html[create VPC Endpoints] for S3 and SQS, and for *every* service you define as an input for the forwarder. S3 and SQS VPC Endpoints are always required for reading the `config.yaml` uploaded to S3 and managing the _Continuing queue_ and the _Replay queue_, regardless of the <<aws-serverless-forwarder-inputs>> used. If you use <<aws-serverless-forwarder-inputs-cloudwatch>> you need to create a VPC Endpoint for EC2 as well.

NOTE: Refer to documentation for https://www.elastic.co/guide/en/cloud/current/ec-traffic-filtering-vpc.html[AWS PrivateLink traffic filters] in order to find your VPC endpoint ID, and the hostname to use in the `config.yml` in order to access your Elasticsearch cluster over PrivateLink.

[discrete]
[[aws-serverless-forwarder-deploy-sar]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ NOTE: Note that you should escape the opening square bracket (`[`) in the regula
|===
| Setting for `negate` | Setting for `match` | Result | Example `pattern: ^b`
| `false` | `after` | Consecutive lines that match the pattern are appended to the previous line that doesn’t match. | image:images/false-after-multi.png[Lines a b b c b b become "abb" and "cbb"]
| `false` | `before` | Consecutive lines that match the pattern are prepended to the next line that doesn’t match. | image:images/false-after-multi.png[Lines b b a b b c become "bba" and "bbc"]
| `false` | `before` | Consecutive lines that match the pattern are prepended to the next line that doesn’t match. | image:images/false-before-multi.png[Lines b b a b b c become "bba" and "bbc"]
| `true` | `after` | Consecutive lines that don’t match the pattern are appended to the previous line that does match. | image:images/true-after-multi.png[Lines b a c b d e become "bac" and "bde"]
| `false` | `before` | Consecutive lines that don’t match the pattern are prepended to the next line that does match. | image:images/true-before-multi.png[Lines a c b d e b become "acb" and "deb"]
|===
Expand Down
64 changes: 62 additions & 2 deletions docs/en/aws-elastic-serverless-forwarder.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,16 @@ The Elastic Serverless Forwarder can forward {aws} data to cloud-hosted, self-ma
[role="screenshot"]
image::images/aws-serverless-lambda-flow.png[AWS Lambda flow]

When you successfully deploy the forwarder, an SQS _Continuing queue_ is automatically created in Lambda to ensure no data is lost. By default the forwarder runs for a maximum of 15 minutes so it's possible that {aws} may exit the function in the middle of processing event data. The forwarder handles this scenario by keeping track of the last offset processed. When the queue triggers a new function invocation, the forwarder will start where the last function run stopped.
Elastic Serverless Forwarder ensures <<aws-serverless-forwarder-at-least-once-delivery,at-least-once delivery>> of the forwarded message.

The forwarder uses a _Replay queue_ (also automatically created during deployment) to handle any ingestion-related exception or fail scenarios. Data in the replay queue is stored as individual events. Lambda keeps track of any failed events and writes them to a replay queue that can then be consumed by adding an additional SQS trigger via Lambda.
When you successfully deploy the forwarder, an SQS _Continuing queue_ is automatically created in Lambda to ensure no data is lost. By default, the forwarder runs for a maximum of 15 minutes, so it's possible that {aws} may exit the function in the middle of processing event data. The forwarder handles this scenario by keeping track of the last offset processed. When the queue triggers a new function invocation, the forwarder will start where the last function run stopped.

The forwarder uses a _Replay queue_ (also automatically created during deployment) to handle any ingestion-related exception or fail scenarios. Data in the replay queue is stored as individual events. Lambda keeps track of any failed events and writes them to the _Replay queue_ that can then be consumed by adding it as an additional SQS trigger via Lambda.

You can use the <<sample-s3-config-file,config.yaml>> file to configure the service for each input and output type, including information such as SQS queue ARN (Amazon Resource Number) and {es} or {ls} connection details. You can create multiple input sections within the configuration file to map different inputs to specific log types.

There is no need to define a specific input in the <<sample-s3-config-file,config.yaml>> for the _Contuining queue_ and the _Replay queue.

The forwarder also supports writing directly to an index, alias, or custom data stream. This enables existing {es} users to re-use index templates, ingest pipelines, or dashboards that are already created and connected to other processes.

[discrete]
Expand Down Expand Up @@ -79,6 +83,62 @@ The forwarder can ingest logs contained within the payload of an Amazon SQS body

You can set up a separate SQS queue for each type of log. The config parameter for {es} output `es_datastream_name` is mandatory. If this value is set to an {es} data stream, the type of log must be correctly defined with configuration parameters. A single configuration file can have many input sections, pointing to different SQS queues that match specific log types.

[discrete]
[[aws-serverless-forwarder-at-least-once-delivery]]
= At-least-once delivery
The Elastic Serverless Forwarder ensures at-least-once delivery of the forwarder messages recurring to the `Continuing queue` and `Replay queue`.

[discrete]
[[aws-serverless-forwarder-at-least-once-delivery-continuing-queue]]
== Continuing queue

The Elastic Serverless Forwarder can run for a maximum amount of time of 15 minutes. Different inputs can trigger the Elastic Serverless Forwarder, with different payload sizes for each execution trigger. The size of the payload impacts on the number of events to be forwarded. Additional impacts are given by configuring <<aws-serverless-define-include-exclude-filters,include/exclude filters definition>>, <<expanding-events-from-json-object-lists, expanding events from JSON object lists>> and <<aws-serverless-manage-multiline-messages,managing multiline messages>>.

The `Continuing queue` takes a role in ensuring at-least-once delivery for the scenario when the maximum amount of time of 15 minutes is not sufficient to forward all the events resulting from a single execution of the Elastic Serverless Forwarder.

For this a grace period of two minutes before reaching the 15 minutes timeout is reserved to handle the remaining events that are left to be processed.
At the beginning of this grace period events forwarding is halted. The rest of the time is dedicated to send a copy of the original messages that contains the remaining events to the `Continuing queue`.
This mechanism removes the necessity to handle partial processing of the trigger at inputs level, that's not always possible (for example in the case of <<aws-serverless-forwarder-inputs-cloudwatch>>), and that will require to force the users to abide to specific configuration of the {aws} resource used as inputs.

The messages in the `Continuing queue` contain metadata related to the offsets where to restart forwarding the events from, and a reference to the original input.

NOTE: You can remove a specific input as trigger of the Elastic Serverless Forwarder, but it is important to not remove its definition from <<sample-s3-config-file,config.yaml>> as well, before ensuring that all the events generated while the input was still a trigger are fully processed, including the ones in the messages copied to the `Continuing queue`. The handling of the messages in the `Continuing queue` requires a lookup of the original input in the `config.yml`.

In the unlikely scenario that the Elastic Serverless Forwarder exceeds its maximum allocated execution time and is forcefully terminated, the `Continuing queues` will not be properly populated with a copy of the messages left to be processed, either all of them or only a portion of them. In this scenario the messages might or not might be lost according to the specific {aws} resource used as input and its specific configuration.

An {aws} SQS https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html[Dead Letter Queue] is created for the `Continuing Queue`.

When the Elastic Serverless Forwarder is triggered by the `Continuing Queue` and in the unlikely scenario that it exceeds its maximum allocated execution time and is forcefully terminated, all the messages in the payload that triggered the Elastic Serverless Forwarder execution will not be deleted from the `Continuing Queue` and will trigger another Elastic Serverless Forwarder execution. The `Continuing Queue` is configured for a number of 3 maximum receives before a message is sent to the DLQ.

[discrete]
[[aws-serverless-forwarder-at-least-once-delivery-replay-queue]]
== Replay queue

The Elastic Serverless Forwarder forwards events to the outputs defined for a specific input. Events to be forwarded are grouped in batches that can be configured according to the specific output.
Failures can happen when forwarding events to an output. According to the output type the granularity of the failure can either be for the whole batch of events, or for single events in the batch.
There are multiple reasons for a failure to happen, including, but not limited, network connectivity, the output service being unavailable or under stress.

The `Replay queue` takes a role in ensuring at-least-once delivery for the scenario when a failure in forwarding an event happens.

For this, after any time a batch of events is forwarded, a copy of all the events in the batch that failed to be forwarded is sent to the `Replay queue`. Each message sent to the `Replay queue` contain exactly one event that failed to be forwarded.

It is possible to enable the `Replay queue` as a trigger of the Elastic Serverless Forwarder in order to forward again the events in the queue.

NOTE: It is left to the user to enable and disable the `Replay queue` as trigger of the Elastic Serverless Forwarder, according to the specific reason of why the forwarding failures happened. In most of the cases it is relevant to resolve the underlying issue causing the failures that populates the `Replay queue`, before trying to forward again the events in the queue. According to the nature and the impact of the issue, forwarding the events again without solving the issue before, will produce the outcome of new failures and events going back to the `Replay queue`. In some scenarios, like the output service being under stress, it is indeed suggested to disable the `Replay queue` as trigger of the Elastic Serverless Forwarder, since keeping forwarding the events could indeed worsen the issue.

When the Elastic Serverless Forwarder is triggered by the `Replay queue`, if no event fails to be forwarded for a specific execution no message is explicitly deleted, the Elastic Serverless Forwarded execution succeeds and the messages in the trigger payload will be removed automatically from the `Replay queue`.
In the case any of the events fails again to be forwarded, all the messages in the trigger payload that contain events that didn't fail are deleted, and a specific expected exception is raised. This marks the Elastic Serverless Forwarded execution as failed with the outcome of having only the messages that failed again to go back to the `Replay queue`.

The messages in the `Replay queue` contain metadata with the references to the original input and the original output of the events.

NOTE: You can remove a specific input as trigger of the Elastic Serverless Forwarder, but it is important to not remove its definition from <<sample-s3-config-file,config.yaml>> as well, before ensuring that all the events failed to be ingested while the input was still a trigger are fully processed. The handling of the messages in the `Replay queue` requires a lookup of the original input and output in the `config.yml`.


An {aws} SQS https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html[Dead Letter Queue] is created for the `Replay Queue`.

The same message can go back to the `Replay queue` up to three times, after that it will reach the configured number of 3 maximum receives, and it will be sent to the DLQ.
The same message can go back to the `Replay queue` either because it contains an event that failed again to be forwarded, according to the planned design, or in the unlikely scenario that the Elastic Serverless Forwarder triggered by the queue exceeds its maximum allocated execution time and is forcefully terminated. In this scenario the messages will not be lost and will eventually be sent to the DQL.

[discrete]
[[aws-serverless-forwarder-get-started]]
= Get started
Expand Down
7 changes: 1 addition & 6 deletions docs/en/aws-serverless-troubleshooting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,10 @@
You can view the status of deployment actions and get additional information on events, including why a particular event fails e.g. misconfiguration details.

. On the Applications page for **serverlessrepo-elastic-serverless-forwarder**, click **Deployments**.
. You can view the **Deployment history** here and refresh the page for updates as the application deploys. It should take around 5 minutes to deploy &mdash; if the deployment fails for any reason, the create events will be rolled back and you will be able to see an explanation for which event failed.
. You can view the **Deployment history** here and refresh the page for updates as the application deploys. It should take around 5 minutes to deploy &mdash; if the deployment fails for any reason, the create events will be rolled back, and you will be able to see an explanation for which event failed.

NOTE: For example, if you don't increase the visibility timeout for an SQS queue as described in <<aws-serverless-forwarder-inputs-s3>>, you will see a `CREATE_FAILED`**Status** for the event, and the **Status reason** provides additional detail.

[discrete]
[[aws-serverless-troubleshooting-vpc-prerequisites]]
== Prerequisites when attached to a VPC
If the Elastic Serverless Forwarder is attached to a VPC, you need to https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html[create VPC Endpoints] for S3 and SQS, and for *every* service you define as an input for the forwarder. S3 and SQS VPC Endpoints are always required for reading the `config.yaml` uploaded to S3 and managing the _Continuing queue_ and the _Replay queue_, regardless of the <<aws-serverless-forwarder-inputs>> used.

[discrete]
[[preventing-unexpected-costs]]
== Preventing unexpected costs
Expand Down
Loading