Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Add MurmurHash3 converter #34155

Closed

Conversation

kaisecheng
Copy link
Contributor

Description:

Add MurmurHash3 function to convert the target to a hexadecimal string of the murmurHash3 hash/digest
The hash result has cross checked with python and ruby

Link to tracking Issue:
#34077

Testing:

  • unit test
  • e2e test

Documentation:

  • readme
  • chlog

@kaisecheng kaisecheng requested a review from a team July 18, 2024 14:18
Copy link

linux-foundation-easycla bot commented Jul 18, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@crobert-1
Copy link
Member

Hello @kaisecheng, thanks for your PR! Just to let you know, I'm going to wait to review until the code owners of OTTL get a chance to triage the original issue to make sure it makes sense to add this functionality.

pkg/ottl/ottlfuncs/README.md Outdated Show resolved Hide resolved
pkg/ottl/ottlfuncs/README.md Outdated Show resolved Hide resolved
pkg/ottl/go.mod Outdated
@@ -10,6 +10,7 @@ require (
github.com/json-iterator/go v1.1.12
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal v0.105.0
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatatest v0.105.0
github.com/spaolacci/murmur3 v1.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned in general about using spaolacci/murmur3. It has some open issues and PRs, but it appears to be dead at this point. Do you have thoughts of potentially using another package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twmb/murmur3 could be an alternative. Initially, I was concerned about the license since it is not listed as BSD-3-Clause. However, after a closer look, it seems acceptable. I did a quick test, and it appears that the hash result from Sum128() requires the same byte flip as the current solution.
I don't see a significant benefit in switching the library, but I am open to change it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have with `spaolacci/murmur3 - there is an outstanding issue, which I believe is the likely cause of some flaky test failures on previous CI runs in this PR.

AFAICT, the twmb/murmur3 is a fork initially created to specifically fix this issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robbavey you are right. will change to twmb/murmur3


version := v128
if !args.Version.IsEmpty() {
if (args.Version.Get() != v32) && (args.Version.Get() != v128) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It might make sense to make this a switch statement with the error in the default case. This would allow the exhaustive linter to ensure we don't miss adding a new valid value in case of future changes.

Comment on lines 17 to 18
v32 = "32"
v128 = "128" // default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make these enum values? (To go along with my other nit about a switch statement)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an enum means the syntax to call the 32-bit version will be MurmurHash3("something", version=0). I think this is more confusing than MurmurHash3("something", version="32").


version := v128
if !args.Version.IsEmpty() {
if (args.Version.Get() != v32) && (args.Version.Get() != v128) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It might make sense to just put args.Version.Get() into a new variable here, and check that value, and set version to it if it's valid, rather than calling the method so many times for the same value.


// MurmurHash3HexString returns the hexadecimal representation of the hash in little-endian format.
// MurmurHash3, developed by Austin Appleby, is sensitive to endianness. Unlike some other languages like Python,
// which use little-endian for all architectures, the Go library `spaolacci/murmur3` has some open issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to directly reference the open issues being referenced. That way in the future we could remove this if they're ever fixed, and just to be able to quickly see the underlying issue.

Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to continue discussing, via the issue, whether or not we need this function.

@github-actions github-actions bot added the cmd/otelcontribcol otelcontribcol command label Jul 30, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 14, 2024
If `version` is

- `v32_hash`: Uses 32-bit version and returns a signed integer hash.
- `v128_hash`: Use 128-bit version and returns an array of two signed integer hash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `v128_hash`: Use 128-bit version and returns an array of two signed integer hash.
- `v128_hash`: Uses 128-bit version and returns an array of two signed integer hash.

@github-actions github-actions bot removed the Stale label Aug 24, 2024
Copy link
Contributor

github-actions bot commented Sep 8, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 8, 2024
@github-actions github-actions bot removed the Stale label Sep 19, 2024
Copy link
Contributor

github-actions bot commented Oct 3, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 3, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants