-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-2186] Emit GoT GTEs to time WorkUnit
prep and to record volume of Work Discovery
#4089
Conversation
…discovered, after reworking `TemporalEventTimer.Factory` for use within an `Activity`
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4089 +/- ##
============================================
- Coverage 47.98% 42.95% -5.03%
+ Complexity 8373 2436 -5937
============================================
Files 1582 507 -1075
Lines 62712 21403 -41309
Branches 7105 2456 -4649
============================================
- Hits 30091 9194 -20897
+ Misses 29899 11272 -18627
+ Partials 2722 937 -1785 ☔ View full report in Codecov by Sentry. |
@@ -150,26 +156,28 @@ public GenerateWorkUnitsResult generateWorkUnits(Properties jobProps, EventSubmi | |||
protected List<WorkUnit> generateWorkUnitsForJobStateAndCollectCleanupPaths(JobState jobState, EventSubmitterContext eventSubmitterContext, Closer closer, | |||
Set<String> pathsToCleanUp) | |||
throws ReflectiveOperationException { | |||
// report (timer) metrics for "Work Discovery", *planning only* - NOT including WU prep, like serialization, `DestinationDatasetHandlerService`ing, etc. | |||
// IMPORTANT: for accurate timing, SEPARATELY emit `.createWorkPreparationTimer`, to record time prior to measuring the WU size required for that one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean here one GTE event for serialization step or GTE event for everything discovery serialization and all ?
and also would that GTE help us in anyway if we had , wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originally, in AbstractJobLauncher
the "WU creation timer" measured only the planning -
gobblin/gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java
Line 476 in 7dbeebf
workUnitsCreationTimer.stop(this.multiEventMetadataGenerator.getMetadata(this.jobContext, |
that is what's included in the GaaSJobObservabilityEvent
.
the timer for WU prep happens a bit later -
gobblin/gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java
Line 549 in 7dbeebf
TimingEvent workUnitsPreparationTimer = |
so in this comment:
"Work Discovery", planning only - NOT including WU prep, like serialization, ...
I just meant that we're timing only planning/creation, not the preparation such as serialization.
as for WU serialization, there is no existing, historical event strictly for that. typically that only takes a long time when memory-constrained and GC-bound. although we could consider adding a new event to time that, for purposes of right-sizing, GC stats are more interesting than the duration it happens to take. if anything, the former is what I'd prioritize.
// IMPORTANT: send prior to `writeWorkUnits`, so the volume of work discovered (and bin packed) gets durably measured. even if serialization were to | ||
// exceed available memory and this activity execution were to fail, a subsequent re-attempt would know the amount of work, to guide re-config/attempt | ||
createWorkPreparedSizeDistillationTimer(wuSizeSummary, eventSubmitterContext).stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if serialization takes time and if it accounts for more OOM issue then should we consider these both in separate activity and launched through one parent workflow as IIUC activity retry will do everything from beginning and discovered workunits will be generated again which can also lead to duplication of GTE, whats your opinion on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having WU planning separate from WU serialization is worth considering. that would allow for a re-attempt of only serialization w/o needing to rerun the planning. there's no concern in sending the GTE again on the re-attempt - that's not the motivation. instead the benefit would be to expedite the re-attempt by subsequently skipping a repeat of successful WU planning.
to separate into two activities we'd need to succeed in persisting some intermediate form of WU planning, so the WU planning activity could "pass input" to the WU serialization activity, as the two won't execute together - or possibly even on the same host. that intermediate form clearly ought to be more likely to succeed than serializing all the WUs themselves - the failure we're trying to address.
this major design choice awaits if we decide to pursue such larger rework.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
GaaSJobObservabilityEvent
s for Gobblin-on-Temporal jobs presently have no values set for the fieldsjobPlanningStartTimestamp
andjobPlanningEndTimestamp
. This is because, in contrast with Gobblin-on-MR,GenerateWorkUnitsImpl
emits noTimingEvent.LauncherTimings.WORK_UNITS_CREATION
GTE to record such values.The historical impediment was that Temporal does not permit Activities to launch other Activities (only Workflows may do that).
TemporalEventTimer
, however, emitted its GTE on a separate Activity for reliability. This change reworksTemporalEventTimer.Factory
to be useable within anActivity
, by refactoring into complementary concrete realizations -TemporalEventTimer.WithinWorkflowFactory
andTemporalEventTimer.WithinActivityFactory
. The latter sets up direct GTE emission within the same (current)Activity
.The alternative of reworking
GenerateWorkUnitsImpl
from anActivity
into aWorkflow
would be challenging to accomplish, because "Work Discovery" may be quite long running - 15-30 mins is not uncommon for large full-initial-copy Iceberg-Distcp jobs - yet Temporal allows a workflow-task to run for a maximum of 2 minutes. (Since we wish to emit this timing GTE in the midst of Work Discovery, it would not be easy to have one sub-activity first performing all of Work Discovery and another subsequently doing GTE emission once the first concludes.)In addition, now that this refactoring happily unblocks GTE-emission from
GenerateWorkUnitsImpl
we were primed to emit new metadata (on another GTE), which is actually even more critical to emit in the midst of Work Discovery: one to record the volume ofWorkUnit
s discovered. To be useful, this must be done prior to WU serialization, which is memory-intensive and may OOM. If that should happen this GTE's durable measurement would inform re-configuration for any subsequent re-attempt.Accordingly, we preserve total size and counts, means, and medians of both original and bin packed WUs. This was inspired by a pair of log messages produced within
CopySource::getWorkUnits
:and
We often seek out these messages to diagnose and resolve failures during
WorkUnit
Discovery, but rather than merely logging, this durable emission sets up fully-automatic (machine-to-machine) right-sizing to orchestrate a potential re-attempt (in case of WU serialization OOM error).(Actual handling of this new GTE
TimingEvent#WORKUNITS_GENERATED_SUMMARY
metadata by GaaS will require a separate change, e.g. toKafkaAvroJobStatusMonitor#parseJobStatus
.)Tests
manual execution
Commits