-
Notifications
You must be signed in to change notification settings - Fork 40
Automatic Splitting: TaskWorker
- For the probe stage, probe jobs have a job id of the form
0-[1-9]+
, and are submitted by a DAG namedRunJobs.dag
. - In the processing stage, jobs have ids
[1-9][0-9]+
, and are contained inRunJobs0.dag
. - Finally, for every tail process (numbered n, n>0), job ids are
n-[1-9][0-9]+
, and are inRunJobsn.dag
.
The following lists the parameters for automatic splitting that are used in the code with their default values:
config.TaskWorker.minAutomaticRuntimeMins = 180
config.TaskWorker.numAutomaticProbes = 5
config.TaskWorker.minAutomaticTailSize = 100
config.TaskWorker.minAutomaticTailTriggers = [50, 80, 100]
The minimum runtime for a job is considered 3 hours, as per minAutomaticRuntimeMins
. The number of splitting probe jobs is controlled by numAutomaticProbes
.
For tail jobs, if less than minAutomaticTailSize
processing jobs are present, one tail DAG will be created. Otherwise, minAutomaticTailTriggers
lists the percentages of completed processing jobs that trigger the creation of a tail DAG. With the above, when 50% of the processing jobs have completed, the first tail DAG will start, the next at 80%, and the final at 100%.
Timing of the probe and tail jobs can be controlled via
config.TaskWorker.automaticProbeRuntimeMins = 15
config.TaskWorker.automaticTailRuntimeMinimumMins = 45
config.TaskWorker.automaticTailRuntimeFraction = 0.2
The default runtime for the probes is 15 minutes, and either 20% of the user-set processing runtime or 45 minutes for the tails, whichever is longer.
To avoid the generated jobs fail due to excessive disk usage, a cap of 5 GB per job is put in place and can be configured:
config.TaskWorker.automaticOutputSizeMaximum = 5 * 1000**3