Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload developer notes: understanding errors and triggers #724

Merged
merged 2 commits into from
May 29, 2024

Conversation

constanca-m
Copy link
Contributor

What does this PR do?

This PR adds some notes to help the maintenance of ESF. Please read the file and the comments for more details.

Signed-off-by: constanca <[email protected]>
Signed-off-by: constanca <[email protected]>
- Ingestion errors cause the trigger `replay-sqs` to be activated
- Configuration errors use the trigger `sqs`, set by the user
- The `config.yaml` to consume messages affected by ingestion errors needs to have inputs for the replay queue and for the resource that produced those messages
- The input for the replay queue to consume ingestion errors is completely discarded. The output used will not be the one from the replay queue, but the one from the input that produced the message
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue. Why does the user specify outputs for the input replay queue if they are not used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

By adding an input to fetch messages from the replay queue (SQS), we can route messages back to ESF. However, these messages will use the original input, including the output, that produced them.

It makes sense to try to honor the original path, but it's also confusing to have an output setting that has no effect.

Could having a replay-sqs input type help reduce ambiguity? Also, making output optional—at least for this type—with override could also help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! Do we want the users to have the ability to change output when using replay-sqs?

- Configuration errors use the trigger `sqs`, set by the user
- The `config.yaml` to consume messages affected by ingestion errors needs to have inputs for the replay queue and for the resource that produced those messages
- The input for the replay queue to consume ingestion errors is completely discarded. The output used will not be the one from the replay queue, but the one from the input that produced the message
- There is no way to consume messages in the replay queue put there for configuration errors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue. The current config.yaml should be used, not the one in the message attributes. This only means we get stuck in a loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's why the config is also stored in the message, right?

We probably need to reevaluate this behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are losing messages in this case?

Copy link
Contributor Author

@constanca-m constanca-m May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are losing messages in this case?

We are not loosing them because they are in the replay queue, and later get sent to the DLQ. But we cannot forward them to a custom output.

@constanca-m
Copy link
Contributor Author

Does it make sense to even set the replay queue as an input the same way we set the other inputs?

The input type of the replay queue should not be set by the user. If the outputs don't matter for the replay queue, then the user should not need to specify them either.

@constanca-m constanca-m self-assigned this May 28, 2024
@constanca-m constanca-m requested a review from a team May 28, 2024 15:29
@zmoog
Copy link
Contributor

zmoog commented May 28, 2024

Does it make sense to even set the replay queue as an input the same way we set the other inputs?

I guess the intent was probably to let users decide when and how to reprocess messages in the replay queue.

The input type of the replay queue should not be set by the user. If the outputs don't matter for the replay queue, then the user should not need to specify them either.

ESF creates the replay queue during the installation, right? To enable reprocessing the messages in the replay queue, users must manually create a trigger to pull the message to replay from the SQS queue to the lambda function, right?

@constanca-m
Copy link
Contributor Author

ESF creates the replay queue during the installation, right?

Yes.

To enable reprocessing the messages in the replay queue, users must manually create a trigger to pull the message to replay from the SQS queue to the lambda function, right?

Yes.

Copy link

@bturquet bturquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved to get the documentation update

@constanca-m constanca-m merged commit befdd23 into elastic:main May 29, 2024
4 checks passed
@constanca-m constanca-m deleted the understand-replay-sqs-trigger branch May 29, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants