-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dead letter index: align error field to ECS and do not forward retryable errors #793
Conversation
It looks good, is there a reason why you left it as draft? I see #799 that should be affected by these changes. I'm also wondering if changing this to ECS format should require a version increase? |
@constanca-m, I created #799 as an experiment, I guess we can focus on this one. |
We have different fields depending on the error type, and some fields can have various types ( |
Quick recap:
|
Nice research @zmoog ! In that case we should include a test in the code with the possible different types. |
We are targeting the following mappings: {
"mappings": {
"dynamic": "false",
"_data_stream_timestamp": {
"enabled": true
},
"date_detection": false,
"numeric_detection": false,
"properties": {
"error": {
"properties": {
"code": {
"type": "keyword",
"ignore_above": 1024
},
"id": {
"type": "keyword",
"ignore_above": 1024
},
"message": {
"type": "match_only_text"
},
"stack_trace": {
"type": "wildcard",
"fields": {
"text": {
"type": "match_only_text"
}
}
},
"type": {
"type": "keyword",
"ignore_above": 1024
}
}
}
}
}
}
|
We should probably at least provide a Here's a possible mapping between error conditions and the
When available, we may also consider adding a http.response.status_code with the status code. Dumping all the options now, don't consider this a plan (yet). |
@zmoog Thank you for the comparisons!! What's the |
262d751
to
089b19e
Compare
@zmoog LGTM, thanks for tackling it! I'll defer the approval to @kaiyan-sheng since she had some comments |
Hey reviewers! Sorry for the noise; I just pushed the last commit with the docs update 🙇 |
@@ -230,6 +245,10 @@ For `elasticsearch` the following arguments are supported: | |||
* `args.api_key`: API key of elasticsearch endpoint in the format `base64encode(api_key_id:api_key_secret)`. Mandatory when `args.username` and `args.password` are not provided. Will be ignored if `args.username`/`args.password` are defined. | |||
* `args.es_datastream_name`: Name of data stream or index where logs should be forwarded to. Lambda supports automatic routing of various {aws} service logs to the corresponding data streams for further processing and storage in the {es} cluster. It supports automatic routing of `aws.cloudtrail`, `aws.cloudwatch_logs`, `aws.elb_logs`, `aws.firewall_logs`, `aws.vpcflow`, and `aws.waf` logs. For other log types, if using data streams, you can optionally set its value in the configuration file according to the naming convention for data streams and available integrations. If the `es_datastream_name` is not specified and it cannot be matched with any of the above {aws} services, then the value will be set to `logs-generic-default`. In versions **v0.29.1** and below, this configuration parameter was named `es_index_or_datastream_name`. Rename the configuration parameter to `es_datastream_name` in your `config.yaml` file on the S3 bucket to continue using it in the future version. The older name `es_index_or_datastream_name` is deprecated as of version **v0.30.0**. The related backward compatibility code is removed from version **v1.0.0**. | |||
* `args.es_dead_letter_index`: Name of data stream or index where logs should be redirected to, in case indexing to `args.es_datastream_name` returned an error. | |||
* `es_dead_letter_forward_errors`: List of errors that should be forwarded to the dead letter index. The default value is an empty list (forward all errors). If the list is not empty, only the errors in the list will be forwarded to the dead letter index. | |||
In general, you can use values from the `error.type` field in the Elasticseach API response. Here is a few examples of errors that can be forwarded to the dead letter index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, you can use values from the
error.type
What does this mean exactly? I see the condition to not be sent is:
( # no http status: connection error
self._es_dead_letter_forward_errors
and action["error"]["type"] not in self._es_dead_letter_forward_errors
):
I don't really understand the first part of the condition. It means you have to write the whole error to exclude it? Is there another way to exclude an error other than its error.type
? @zmoog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I had the feeling this part wasn't clear enough.
In general, you can use values from the
error.type
The error.type
in this sentence refers to the Elasticsearch API response.
For example, here's the Elasticsearch API response for
{
"error": {
"root_cause": [
{
"type": "fail_processor_exception",
"reason": "Fail message"
}
],
"type": "fail_processor_exception",
"reason": "Fail message"
},
"status": 500
}
_parse_error()
picks error.type
from the ES API response, so the shipper can later compare it with the values in the _es_dead_letter_forward_errors
list.
I don't really understand the first part of the condition. It means you have to write the whole error to exclude it?
I don't get what you mean with "you have to write the whole error to exclude it".
Is there another way to exclude an error other than its error.type?
Users reported problems with the DLI forwarding all errors to ES.
This PR focuses on adding essential filtering capability:
- do not forward connection-related errors (we assumed errors without
http.response.status_code
are related o connection failures). - filter out errors based on their
error.type
There are probably more ways to filter the errors based on other criteria.
However, the Failure Store will ship in one upcoming ES release. So, it's better to give ESF what it needs today without re-implementing Failure Store features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@constanca-m I tried to clarify the es_dead_letter_forward_errors
option docs: does it flow better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely! Thanks, what was causing me confusion was the term in general
because it was giving me the idea there were other options to filter errors other than the error type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will never ignore the feeling that a sentence isn't working again.
Thanks for raising the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zmoog Can we add the clarification that error.type
is the one returned by bulk()
api? AFAIK, there's no exhaustive list of error types but that should be a good starting point
I'm trying!
Minimum allowed coverage is Generated by 🐒 cobertura-action against f41a5f9 |
ES shipper should only forward to DLI persistent errors like mapping exceptions.
@kaiyan-sheng, @constanca-m, @emilioalvap, I'm sorry for pushing yet another commit to this PR 🙇 Our tech leads asked to simplify the criteria to forward to DLI along these lines. The ES output should not forward the error to DLI when:
|
Sounds good! |
for action in actions: | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@zmoog The documentation is no longer in this PR, did you remove it on purpose? |
I removed it on purpose because the error types list no longer exists. But I should update the docs for the |
What does this PR do?
This PR brings the following changes:
error
field in documents sent to the dead letter index (DLI) to the ECS format; the field now provideserror.message
anderror.type
.http.response.status_code
fieldhttp.response.status_code
)Here is a sample error document from a mapping conflict:
Why is it important?
Checklist
CHANGELOG.md
Related issues