Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove delete sqs message calls and optimise json collector decorator #534

Conversation

aspacca
Copy link
Contributor

@aspacca aspacca commented Nov 23, 2023

Enhancement

What does this PR do?

Remove calls to sqs.DeleteMessage that are the most expensive operation in the forwarder, and optimise JsonCollector decorator in order to parse as json the storage content only when strictly required

It also bumps the version of Localstack used to 3.x

Why is it important?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

…ent type, or with a events list expander from field)
@aspacca aspacca requested a review from girodav November 23, 2023 08:50
@aspacca aspacca self-assigned this Nov 23, 2023
Copy link

github-actions bot commented Nov 23, 2023

File Coverage Lines Branches Missing
All files 100% 100% 100%
main_aws.py 100% 100% 100%
handlers/__init__.py 100% 100% 100%
handlers/aws/__init__.py 100% 100% 100%
handlers/aws/cloudwatch_logs_trigger.py 100% 100% 100%
handlers/aws/exceptions.py 100% 100% 100%
handlers/aws/handler.py 100% 100% 100%
handlers/aws/kinesis_trigger.py 100% 100% 100%
handlers/aws/replay_trigger.py 100% 100% 100%
handlers/aws/s3_sqs_trigger.py 100% 100% 100%
handlers/aws/sqs_trigger.py 100% 100% 100%
handlers/aws/utils.py 100% 100% 100%
share/__init__.py 100% 100% 100%
share/config.py 100% 100% 100%
share/environment.py 100% 100% 100%
share/events.py 100% 100% 100%
share/expand_event_list_from_field.py 100% 100% 100%
share/factory.py 100% 100% 100%
share/include_exlude.py 100% 100% 100%
share/json.py 100% 100% 100%
share/logger.py 100% 100% 100%
share/multiline.py 100% 100% 100%
share/secretsmanager.py 100% 100% 100%
share/utils.py 100% 100% 100%
share/version.py 100% 100% 100%
shippers/__init__.py 100% 100% 100%
shippers/composite.py 100% 100% 100%
shippers/es.py 100% 100% 100%
shippers/factory.py 100% 100% 100%
shippers/logstash.py 100% 100% 100%
shippers/shipper.py 100% 100% 100%
storage/__init__.py 100% 100% 100%
storage/decorator.py 100% 100% 100%
storage/factory.py 100% 100% 100%
storage/payload.py 100% 100% 100%
storage/s3.py 100% 100% 100%
storage/storage.py 100% 100% 100%

Minimum allowed coverage is 100%

Generated by 🐒 cobertura-action against 670f763

@aspacca aspacca changed the title Remove delete sqs message calls and dptimise json collector decorator Remove delete sqs message calls and optimise json collector decorator Nov 23, 2023
girodav
girodav previously approved these changes Nov 28, 2023
@@ -72,9 +72,10 @@ def setUpClass(cls) -> None:
lgc = LogstashContainer(es_container=esc)
cls.logstash = lgc.start()

lsc = LocalStackContainer(image="localstack/localstack:1.4.0")
lsc = LocalStackContainer(image="localstack/localstack:3.0.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

We could use 3.0.1 here instead. We should have a look at how to automate these changes with dependabot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: It would be great, even as separate PR, to have a comment briefly explaining what this code does. The same applies for the overall json processing logic.

Copy link
Contributor

@girodav girodav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add a changelog entry as part of v1.11.0

dependabot bot and others added 3 commits November 28, 2023 12:39
Bumps [boto3](https://github.com/boto/boto3) from 1.28.80 to 1.33.1.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.28.80...1.33.1)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [boto3](https://github.com/boto/boto3) from 1.33.1 to 1.33.2.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.33.1...1.33.2)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@tommyers-elastic
Copy link

thanks for the PR! i'm still working through, but a couple of more general comments:

there's lots of changes here to the JSON logic - it would be really useful to have a detailed description of what has changed and why in the PR description.

additionally, in the future, it would be great to split the changes into seperate PRs to keep them logically seperate, and easier to review. here there are 3 changes: sqs delete message, localstack, and json parser.

@aspacca
Copy link
Contributor Author

aspacca commented Nov 30, 2023

additionally, in the future, it would be great to split the changes into seperate PRs to keep them logically seperate, and easier to review. here there are 3 changes: sqs delete message, localstack, and json parser.

👍

localstack is required to pass CI since upgrade of boto3, unluckily :|

as for the "sqs delete message", yes, there's already a separate branch for it: I've based my branch on it , since it's not yet merged to main it shows up in the diff: let me update the base branch

@aspacca aspacca changed the base branch from main to remove-sqs-delete-message-calls November 30, 2023 10:07
@@ -100,7 +100,7 @@ def test_collect_buffer_collect(
for i, line in enumerate(lines):
collect_buffer.grow(data=line, newline=newline)

content, content_length = collect_buffer.collect_and_reset()
content, content_length, _ = collect_buffer.collect_and_reset()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we validate the newline returned from collect_and_reset is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we reset newline could be empty, that's why we ignore it and keep the one from the previous collect

@@ -432,26 +430,24 @@ def feed_iterator(content: bytes) -> Iterator[tuple[bytes, bytes, int]]:
b"\r\n",
b"line1\r\n line1.1\r\n line1.2\r\nline2\r\n line2.1\r\n line2.2\r\n",
[(b"line1", 7), (b" line1.1", 11), (b" line1.2", 11), (b"line2", 7), (b" line2.1", 11), (b" line2.2", 11)],
id="circuit breaker: \n",
id="circuit breaker: \r\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for this change in the input data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just the id of the test: the description that will be printed when running the test. It was wrong, since here we were testing \r\n in the content as newline separator, not \n

@aspacca
Copy link
Contributor Author

aspacca commented Dec 1, 2023

I will add unit test today for the decorators, then we can merge

@aspacca aspacca merged commit 4c76f27 into elastic:remove-sqs-delete-message-calls Dec 1, 2023
default_max_bytes: int = 10485760 # Default maximum number of bytes to return in one multi-line event
default_max_lines: int = 500 # Default maximum number of lines to return in one multi-line event
default_multiline_timeout: int = 5 # Default timeout in secs to finish a multi-line event.

timedelta_circuit_breaker: datetime.timedelta = datetime.timedelta(seconds=5)

# CollectTuple is a tuple representing the multilines bytes content, the length of the content and the newline found

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are really useful - thanks

aspacca pushed a commit that referenced this pull request Dec 6, 2023
…#544)

* Remove sqs delete record calls

* Remove delete sqs message calls and optimise json collector decorator (#534)

* parse the content as json only when really necessary (not single content type, or with a events list expander from field)

* make lint

* fix bugs

* bump localstack container

* add ec2 service to localstack

* fix comment

* fix typo

* reduce more complexity

* fix test for localstack 3.0.0

* fix lint

* further semplification and better performance

* remove debug printing

* remove debug printing

* clean get_by_lines unit test

* fix lint

* even simplier

* fix lint

* increase coverage

* Add ESF specific User-Agent header in outgoing Elasticsearch requests (#537)

* Add ESF specific User-Agent header in outgoing Elasticsearch requests
* Bump minimum supported Elastic stack version to 7.17

* Bump boto3 from 1.28.80 to 1.33.1 (#541)

Bumps [boto3](https://github.com/boto/boto3) from 1.28.80 to 1.33.1.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.28.80...1.33.1)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump boto3 from 1.33.1 to 1.33.2 (#542)

Bumps [boto3](https://github.com/boto/boto3) from 1.33.1 to 1.33.2.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.33.1...1.33.2)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* inline doc

* fix tests

* changelog

* add back empty newline in get_by_lines_parameters

* use type aliases

* add comments

* fix typo in comment

* add more comments

* add decorator unit tests

* fix decorator according to test expectations

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Davide Girardi <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update CHANGELOG.md

* revert not updating ending offset in ExpandEventListFromField

* add typing-extensions==4.8.0 to requirements

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: girodav <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Localstack >= 2.x
3 participants