Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix parsing of noncompliant RFC3339 timestamps missing only a timezone #3346

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gilbsgilbs
Copy link

@gilbsgilbs gilbsgilbs commented Nov 26, 2024

This commit fixes the parsing of dates that are almost RFC3339
compliant, except they are just missing a timezone. It seems that this
format (which is still ISO 8601 compliant, but not RFC3339) is quite
widely used. Some Crowdsec parsers from the hub had to deal with this
format and ended up appending a "Z" consistently to make the timestamp
UTC and make it RFC3339 compliant again:

Handling this edge-case at the parser level would make things less
fragile, and would prevent such dirty workarounds from spreading on the
hub. I don't see any downside because it doesn't break any existing
parsing, it just adds support for more formats.

Also note that per-specification, adding UTC timezone to a
timezone-naive timestamps is not actually the 100% accurate thing to do.
In theory, we should use the "local" timezone… of the machine that
initially emitted the log, which is hard to figure out. But this is a
tradeoff that will at least prevent parsing errors, and sounds like a
reasonable default as downstream can still override this behavior by
specifying a timezone explicitely to get rid of any ambiguity.

Copy link

@gilbsgilbs: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.

  • /kind feature
  • /kind enhancement
  • /kind refactoring
  • /kind fix
  • /kind chore
  • /kind dependencies
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

Copy link

@gilbsgilbs: There are no area labels on this PR. You can add as many areas as you see fit.

  • /area agent
  • /area local-api
  • /area cscli
  • /area appsec
  • /area security
  • /area configuration
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@gilbsgilbs
Copy link
Author

/kind fix

@gilbsgilbs
Copy link
Author

/area agent

This commit fixes the parsing of dates that are almost RFC3339
compliant, except they are just missing a timezone. It seems that this
format (which is still ISO 8601 compliant, but not RFC3339) is quite
widely used. Some Crowdsec parsers from the hub had to deal with this
format and ended up appending a "Z" consistently to make the timestamp
UTC and make it RFC3339 compliant again:

- authentik-logs: https://github.com/crowdsecurity/hub/blob/146659cd4ac19abfa87e39b5e5c0ec8bc4313bf8/parsers/s01-parse/firix/authentik-logs.yaml#L24
- redmine-logs: https://github.com/crowdsecurity/hub/blob/146659cd4ac19abfa87e39b5e5c0ec8bc4313bf8/parsers/s01-parse/LePresidente/redmine-logs.yaml#L22
- qbittorent [not upstreamed yet]: https://github.com/crowdsecurity/hub/pull/1179/files#diff-ba102ec88ac5a804fd6acfac54bdae1778b44992ed8b550a011082a32e6f9b9cR32
  - I tried to be a bit cautious by checking if a timezone is already
    present before appending one because I suspected that the missing
    timezone might be due to a system configuration quirk rather than
    something intended by the developer and stable from one machine to
    another.

Handling this edge-case at the parser level would make things less
fragile, and would prevent such dirty workarounds from spreading on the
hub. I don't see any downside because it doesn't break any existing
parsing, it just adds support for more formats.

Also note that per-specification, adding UTC timezone to a
timezone-naive timestamps is not actually the 100% accurate thing to do.
In theory, we should use the "local" timezone… of the machine that
initially emitted the log, which is hard to figure out. But this is a
tradeoff that will at least prevent parsing errors, and sounds like a
reasonable default as downstream can still override this behavior by
specifying a timezone explicitely to get rid of any ambiguity.
@buixor buixor added this to the 1.6.5 milestone Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants