-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a BoundedTrie metric which is used to efficiently store and… #33385
Conversation
R: @robertwb |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieCell.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
} else if (other.root().isPresent() && other.singleton().isPresent()) { | ||
return this; | ||
} else { | ||
BoundedTrieNode combined = new BoundedTrieNode(asTrie()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/BoundedTrieResult.java
Outdated
Show resolved
Hide resolved
c9f106b
to
01ffe0f
Compare
5a5ddae
to
ebbe4f9
Compare
ebbe4f9
to
35e0a8c
Compare
@robertwb : Please have another look. Thank you very much. |
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/BoundedTrieResult.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
} | ||
|
||
/** Returns a new {@link BoundedTrieData} instance that is a deep copy of this instance. */ | ||
public synchronized BoundedTrieData getCumulative() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a BoundedTrieData over a Set to be consistent with other MetricData in java SDK
* | ||
* @param other The other {@link BoundedTrieData} to combine with. | ||
*/ | ||
public synchronized void combine(@Nonnull BoundedTrieData other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return a void rather than a reference to updated object because the object is mutable and returning a reference from here will then require unnecessary overhead of using AtomicReference in BoundedTrieCell to safely be able to handle combine call from mutli-threads.
The 2 failing checks |
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
...ogle-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java
Outdated
Show resolved
Hide resolved
...ogle-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/beam/runners/dataflow/worker/MetricsToCounterUpdateConverter.java
Outdated
Show resolved
Hide resolved
runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableMetrics.java
Outdated
Show resolved
Hide resolved
@@ -277,7 +286,8 @@ public static class CommittedMetricTests extends SharedTestBase { | |||
UsesCounterMetrics.class, | |||
UsesDistributionMetrics.class, | |||
UsesGaugeMetrics.class, | |||
UsesStringSetMetrics.class | |||
UsesStringSetMetrics.class, | |||
UsesBoundedTrieMetrics.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we lose (ordinary) counter coverage for runners not yet supporting bounded trie metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so.
This (all inclusive) annotation is on testAllAttemptedMetrics
so only runner which supports all of them will run it. But the runner who do not support will not run testAllAttemptedMetrics
but they will run others like for example for counters (ordinary) there is a test method below testAttemptedCounterMetrics
which is annotated only with @Category({ValidatesRunner.class, UsesAttemptedMetrics.class, UsesCounterMetrics.class})
.
db4e944
to
9298dfa
Compare
@@ -94,6 +94,7 @@ dependencies { | |||
// io-kafka is only used in PTransform override so it is optional | |||
provided project(":sdks:java:io:kafka") | |||
implementation project(":sdks:java:io:google-cloud-platform") | |||
implementation project(":runners:core-java") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertwb I believe adding this dependency here is acceptable i.e. specialized runners can depend on core. Can you please confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
...ogle-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/beam/runners/dataflow/worker/MetricsToCounterUpdateConverter.java
Outdated
Show resolved
Hide resolved
@@ -277,7 +286,8 @@ public static class CommittedMetricTests extends SharedTestBase { | |||
UsesCounterMetrics.class, | |||
UsesDistributionMetrics.class, | |||
UsesGaugeMetrics.class, | |||
UsesStringSetMetrics.class | |||
UsesStringSetMetrics.class, | |||
UsesBoundedTrieMetrics.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so.
This (all inclusive) annotation is on testAllAttemptedMetrics
so only runner which supports all of them will run it. But the runner who do not support will not run testAllAttemptedMetrics
but they will run others like for example for counters (ordinary) there is a test method below testAttemptedCounterMetrics
which is annotated only with @Category({ValidatesRunner.class, UsesAttemptedMetrics.class, UsesCounterMetrics.class})
.
@robertwb: Gentle nudge. |
df8849c
to
f2969ef
Compare
@robertwb I verified that all changes are intended after rebasing. Also please squash them to retain just the first commit
Thanks. |
Failing test is on master fixed here: #33823 |
… aggregate a collection of string sequences (FQNs) with a limited size.
… mutable BoundedTrieData
…in fuzzy run tests
…l Dataflow java client support is ready
f2969ef
to
31b41b0
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #33385 +/- ##
============================================
+ Coverage 57.48% 59.08% +1.60%
- Complexity 1474 3239 +1765
============================================
Files 985 1156 +171
Lines 155918 176834 +20916
Branches 1076 3391 +2315
============================================
+ Hits 89630 104488 +14858
- Misses 64073 68979 +4906
- Partials 2215 3367 +1152
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
spotbugs fails on multi-threading for some cases where they are on purpose not synchronized. cc: @robertwb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll merge as soon as I verify all tests are passing.
… aggregate a collection of string sequences (FQNs) with a limited size.
It is recommended to review this PR by commits.
BoundedTrie is a space-saving way to store many string sequences (like FQN/file paths). It acts like a tree with branches, holding sequences within a size limit. It can efficiently add, combine, and search and perform trimming of children when the size increases beyond defined max.
Let's say we want to store these sequences, with a size limit of 3:
"folder1/file1.txt"
"folder1/file2.txt"
"folder2/file3.txt"
Here's how the BoundedTrie might look:
If we try to add "folder1/file4.txt", the trie might trim to "folder1", dropping all children to stay within the size limit.
This will be used to replace the StringSet metric for lineage tracking for very large lineage graphs to overcome the size limits.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.