Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-2186] Emit GoT GTEs to time WorkUnit prep and to record volume of Work Discovery #4089

Merged
merged 2 commits into from
Jan 2, 2025

Conversation

phet
Copy link
Contributor

@phet phet commented Jan 1, 2025

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

GaaSJobObservabilityEvents for Gobblin-on-Temporal jobs presently have no values set for the fields jobPlanningStartTimestamp and jobPlanningEndTimestamp. This is because, in contrast with Gobblin-on-MR, GenerateWorkUnitsImpl emits no TimingEvent.LauncherTimings.WORK_UNITS_CREATION GTE to record such values.

The historical impediment was that Temporal does not permit Activities to launch other Activities (only Workflows may do that). TemporalEventTimer, however, emitted its GTE on a separate Activity for reliability. This change reworks TemporalEventTimer.Factory to be useable within an Activity, by refactoring into complementary concrete realizations - TemporalEventTimer.WithinWorkflowFactory and TemporalEventTimer.WithinActivityFactory. The latter sets up direct GTE emission within the same (current) Activity.

The alternative of reworking GenerateWorkUnitsImpl from an Activity into a Workflow would be challenging to accomplish, because "Work Discovery" may be quite long running - 15-30 mins is not uncommon for large full-initial-copy Iceberg-Distcp jobs - yet Temporal allows a workflow-task to run for a maximum of 2 minutes. (Since we wish to emit this timing GTE in the midst of Work Discovery, it would not be easy to have one sub-activity first performing all of Work Discovery and another subsequently doing GTE emission once the first concludes.)

In addition, now that this refactoring happily unblocks GTE-emission from GenerateWorkUnitsImpl we were primed to emit new metadata (on another GTE), which is actually even more critical to emit in the midst of Work Discovery: one to record the volume of WorkUnits discovered. To be useful, this must be done prior to WU serialization, which is memory-intensive and may OOM. If that should happen this GTE's durable measurement would inform re-configuration for any subsequent re-attempt.

Accordingly, we preserve total size and counts, means, and medians of both original and bin packed WUs. This was inspired by a pair of log messages produced within CopySource::getWorkUnits:

Statistics for ConcurrentBoundedPriorityIterable: {ResourcePool: {softBound: [ ... ], hardBound: [ ...]},totalResourcesUsed: [ ... ], maxRequirementPerDimension: [entities: 231943.0, bytesCopied: 1.22419622769628E14], ... }

and

org.apache.gobblin.data.management.copy.CopySource - Bin packed work units. Initial work units: 27252, packed work units: 13175, max weight per bin: 500000000, max work units per bin: 100.

We often seek out these messages to diagnose and resolve failures during WorkUnit Discovery, but rather than merely logging, this durable emission sets up fully-automatic (machine-to-machine) right-sizing to orchestrate a potential re-attempt (in case of WU serialization OOM error).

(Actual handling of this new GTE TimingEvent#WORKUNITS_GENERATED_SUMMARY metadata by GaaS will require a separate change, e.g. to KafkaAvroJobStatusMonitor#parseJobStatus.)

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

manual execution

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

phet added 2 commits December 31, 2024 09:19
…discovered, after reworking `TemporalEventTimer.Factory` for use within an `Activity`
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 2.04082% with 48 lines in your changes missing coverage. Please review.

Project coverage is 42.95%. Comparing base (003590e) to head (8ce206d).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...temporal/workflows/metrics/TemporalEventTimer.java 0.00% 20 Missing ⚠️
...obblin/temporal/ddm/work/WorkUnitsSizeSummary.java 0.00% 13 Missing ⚠️
...poral/ddm/activity/impl/GenerateWorkUnitsImpl.java 0.00% 7 Missing ⚠️
...oral/ddm/workflow/impl/CommitStepWorkflowImpl.java 0.00% 2 Missing ⚠️
.../ddm/workflow/impl/ExecuteGobblinWorkflowImpl.java 33.33% 2 Missing ⚠️
...gobblin/temporal/workflows/metrics/EventTimer.java 0.00% 2 Missing ⚠️
...dm/workflow/impl/ProcessWorkUnitsWorkflowImpl.java 0.00% 1 Missing ⚠️
...ral/workflows/helloworld/GreetingWorkflowImpl.java 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (003590e) and HEAD (8ce206d). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (003590e) HEAD (8ce206d)
2 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4089      +/-   ##
============================================
- Coverage     47.98%   42.95%   -5.03%     
+ Complexity     8373     2436    -5937     
============================================
  Files          1582      507    -1075     
  Lines         62712    21403   -41309     
  Branches       7105     2456    -4649     
============================================
- Hits          30091     9194   -20897     
+ Misses        29899    11272   -18627     
+ Partials       2722      937    -1785     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -150,26 +156,28 @@ public GenerateWorkUnitsResult generateWorkUnits(Properties jobProps, EventSubmi
protected List<WorkUnit> generateWorkUnitsForJobStateAndCollectCleanupPaths(JobState jobState, EventSubmitterContext eventSubmitterContext, Closer closer,
Set<String> pathsToCleanUp)
throws ReflectiveOperationException {
// report (timer) metrics for "Work Discovery", *planning only* - NOT including WU prep, like serialization, `DestinationDatasetHandlerService`ing, etc.
// IMPORTANT: for accurate timing, SEPARATELY emit `.createWorkPreparationTimer`, to record time prior to measuring the WU size required for that one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean here one GTE event for serialization step or GTE event for everything discovery serialization and all ?
and also would that GTE help us in anyway if we had , wdyt ?

Copy link
Contributor Author

@phet phet Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally, in AbstractJobLauncher the "WU creation timer" measured only the planning -

workUnitsCreationTimer.stop(this.multiEventMetadataGenerator.getMetadata(this.jobContext,

that is what's included in the GaaSJobObservabilityEvent.

the timer for WU prep happens a bit later -

so in this comment:

"Work Discovery", planning only - NOT including WU prep, like serialization, ...

I just meant that we're timing only planning/creation, not the preparation such as serialization.

as for WU serialization, there is no existing, historical event strictly for that. typically that only takes a long time when memory-constrained and GC-bound. although we could consider adding a new event to time that, for purposes of right-sizing, GC stats are more interesting than the duration it happens to take. if anything, the former is what I'd prioritize.

Comment on lines +133 to +135
// IMPORTANT: send prior to `writeWorkUnits`, so the volume of work discovered (and bin packed) gets durably measured. even if serialization were to
// exceed available memory and this activity execution were to fail, a subsequent re-attempt would know the amount of work, to guide re-config/attempt
createWorkPreparedSizeDistillationTimer(wuSizeSummary, eventSubmitterContext).stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if serialization takes time and if it accounts for more OOM issue then should we consider these both in separate activity and launched through one parent workflow as IIUC activity retry will do everything from beginning and discovered workunits will be generated again which can also lead to duplication of GTE, whats your opinion on this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having WU planning separate from WU serialization is worth considering. that would allow for a re-attempt of only serialization w/o needing to rerun the planning. there's no concern in sending the GTE again on the re-attempt - that's not the motivation. instead the benefit would be to expedite the re-attempt by subsequently skipping a repeat of successful WU planning.

to separate into two activities we'd need to succeed in persisting some intermediate form of WU planning, so the WU planning activity could "pass input" to the WU serialization activity, as the two won't execute together - or possibly even on the same host. that intermediate form clearly ought to be more likely to succeed than serializing all the WUs themselves - the failure we're trying to address.

this major design choice awaits if we decide to pursue such larger rework.

@phet phet merged commit 3269764 into apache:master Jan 2, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants