Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated docs #588

Merged
merged 5 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/en/aws-deploy-elastic-serverless-forwarder.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,8 @@ These parameters define the permissions required in order to access the associat

[discrete]
=== Network
It is not possible to define directly the ID of the {aws} VPC the Elastic Serverless Forwarder lambda belongs to.
The {aws} VPC is defined instead by its security group IDs and subnet IDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth explaining why it is not possible and add a link to the AWS docs.

As additional note, I do not think this information is enough to close out #488 as there are more pieces involved in making ESF work with PrivateLink, apart from the VPC settings (e.g VPC endpoints)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPC endpoints are documented here

Let's move the content to this section! (and add the requirement for an EC2 VPC endpoint if cloudwatch logs are used as input)

I will add a link to https://www.elastic.co/guide/en/cloud/current/ec-traffic-filtering-vpc.html as well

do you have any hint about what @Udayel mentioned in #488?

document few important parameters like BOTO_DISABLE_COMMONNAME=False

I remember it was something related to a version bump of the ESF boto dependency, but I never hit the issue and I'm not sure if it still applies

These parameters define the network settings for your environment.

- `ElasticServerlessForwarderSecurityGroups`: Add a comma delimited list of security group IDs to attach to the forwarder. Along with `ElasticServerlessForwarderSubnets`, these settings will define the {aws} VPC the forwarder will belong to. Leave blank if you don't want the forwarder to belong to any specific {aws} VPC.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ NOTE: Note that you should escape the opening square bracket (`[`) in the regula
|===
| Setting for `negate` | Setting for `match` | Result | Example `pattern: ^b`
| `false` | `after` | Consecutive lines that match the pattern are appended to the previous line that doesn’t match. | image:images/false-after-multi.png[Lines a b b c b b become "abb" and "cbb"]
| `false` | `before` | Consecutive lines that match the pattern are prepended to the next line that doesn’t match. | image:images/false-after-multi.png[Lines b b a b b c become "bba" and "bbc"]
| `false` | `before` | Consecutive lines that match the pattern are prepended to the next line that doesn’t match. | image:images/false-before-multi.png[Lines b b a b b c become "bba" and "bbc"]
| `true` | `after` | Consecutive lines that don’t match the pattern are appended to the previous line that does match. | image:images/true-after-multi.png[Lines b a c b d e become "bac" and "bde"]
| `false` | `before` | Consecutive lines that don’t match the pattern are prepended to the next line that does match. | image:images/true-before-multi.png[Lines a c b d e b become "acb" and "deb"]
|===
Expand Down
20 changes: 20 additions & 0 deletions docs/en/aws-elastic-serverless-forwarder.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,26 @@ The forwarder can ingest logs contained within the payload of an Amazon SQS body

You can set up a separate SQS queue for each type of log. The config parameter for {es} output `es_datastream_name` is mandatory. If this value is set to an {es} data stream, the type of log must be correctly defined with configuration parameters. A single configuration file can have many input sections, pointing to different SQS queues that match specific log types.

[discrete]
[[at-least-once-delivery]]
== At-least-once delivery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section might be easier to digest if we split it in two, providing details for the two queues in two separate sections/sub-sections (e.g "More details about the Replay queue" or "Replay queue internals).
Each section can follow a pattern like the following:

  • What part this queue plays in ensuring at-least-once delivery
  • When ESF sends messages to this queue
  • Failures/DLQ scenarios

I would also add a note about the recent changes we made, removing sqs.deleteMessage calls and their impact.


Elastic Serverless Forwarder ensures "at-least-once" delivery of the forwarded message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this statement should be in the overview, where the Replay queue and Continuing queue are introduced for the first time.


The `Continuing queue` and `Replay queue` are programmatically populated by Elastic Serverless Forwarder.
This happens for the `Continuing quque` in the last two minutes of the lambda execution, if any message in the original trigger is left to be processed before that time (including the original triggers and the continuing and replay queues themselves).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This happens for the `Continuing quque` in the last two minutes of the lambda execution, if any message in the original trigger is left to be processed before that time (including the original triggers and the continuing and replay queues themselves).
This happens for the `Continuing queue` in the last two minutes of the lambda execution, if any message in the original trigger is left to be processed before that time (including the original triggers and the continuing and replay queues themselves).

Hmm does This mean the creation of the queues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, This refers to "programmatically populated"

In the case of the `Replay queue` at any time an event is failed to be ingested (excluding an event already routed to the `Replay queue` itself, in case it fails again)

In the case of the `Continuing queue` Elastic Serverless Forwarder can fail to process the messages only if the lambda timeouts or if some unexpected exception is raised during the lambda execution. In all the other cases the original message that triggered the lambda is deleted from the `Continuing queue`, regardless the lambda processed it or not. If the message was not processed we send a copy of the original message to the `Continuing queue`: this has the effect of zeroing the number of the max retries and the mechanism is in place to enforce at-least-once delivery. It should be pretty rare to have messages in the continuing queue DLQ unless the lambda keeps timing out.

In the case of the `Replay queue` the same considerations regarding lambda timeouts and unexpected exceptions apply.
When processing messages from the `Replay queue`, at the end of the lambda execution, processed replayed messages that didn't fail to be ingested again will be deleted. Messages that fail to be ingested again won't be deleted, and a specific expected exception will be raised. This will mark the lambda as failed. This has the outcome of having only the messages that failed again to go back to the `Replay queue`. Being the same messages the retries count will increase, if it reaches the max value they will end up in the `Replay DLQ` as managed automatically by the {aws} SQS service.

The corresponding DLQs are handled automatically by the SQS service in {aws}: there's the option to set the max retries of a queue, ie, the total amount of times the same message in a queue will trigger the lambda if the lambda won't successfully process the message. Once this limit is reached the message will go to the corresponding DLQ.

For both `Continuing queue` and `Replay queue`, the max retries value is set to 3


[discrete]
[[aws-serverless-forwarder-get-started]]
= Get started
Expand Down
Loading