Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: shell: refactor output plugin and enable per-node/task output files #6539

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 7, 2025

This PR refactors the job shell output plugin with the intent of making per-node/task output files possible.

This PR is currently marked as a WIP because I think eventually the incremental refactor commits should be all squashed into a single "rewrite output plugin" commit, as I'm not sure the separate commits would offer that much in the git history. However, I left them here for now because it might be slightly easier to review the process (though hopefully it doesn't make a review more confusing)

The refactor splits the output plugin into components now located under src/shell/output/. The main source of new functionality in the "task output list" abstraction found in src/shell/output/task.[ch], which allows each task to separately open an output file and write data to it. Per-shell and per-task output files are supported simply by specifying a node or task-specific mustache tag in the --output or --error template. Tasks then either write directly to a file if they've opened one, or to the shell output service if shell_rank > 0, or directly to the KVS eventlogger on rank0.

The output plugin also has to handle log messages (see shell/output/log.[ch] and shell.output plugin calls (currently only made by the pty plugin). Log messages are written to the same destination as the local task index=0, while shell.output data is written to the appropriate task in the task output list.

Other high-level changes (obvious when going through the commits)

  • The unused term (terminal) output mode is removed
  • jobspec option parsing is greatly simplified and (mostly) contained in shell/output/conf.[ch]
  • The intermediate JSON output array is removed (was no longer necessary)
  • reference counting is simplified
  • generally a few less unnecessary pack/unpack and ioencode/iodecode calls in the common cases

A couple other potential issues:

  • Though RFC 24 allows for per-rank "redirect" events, this PR opts to not implement that, since adding a redirect event per task on large jobs could thwart the point of redirecting output away from the KVS in the first place. Instead, a redirect event without any per-rank or per-task templates substituted is used, e.g.:
    $ flux run -N4 -n 16 --output=flux-{{id}}.{{task.id}} hostname
    [0-15]: stdout redirected to flux-ƒo4c9DXvNT.{{task.id}}
    [0-15]: stderr redirected to flux-ƒo4c9DXvNT.{{task.id}}
    This seemed like a good compromise for now.
  • The job shell needs to determine if an output template is "per shell" or not. It does this by checking if the template renders differently for a different rank or task. This test isn't fool-proof though, because shell plugins are allowed to extend the set of supported mustache tags, and there may be no way to determine if these tags will render differently on different tasks/shell ranks from a single shell. (For instance, {{tmpdir}} tag has to be treated as a special case now, though it is unlikely anyone would use that tag in their output template).

grondo added 4 commits January 7, 2025 00:33
Problem: If a pre-exec hook generates a lot of output and fills stdio
buffers, then the subprocess child could block inside the pre-exec
hook. Since the parent is waiting for the sync fd to close, this
could cause a deadlock.

Set the stdout and stderr fds to non-blocking for the duration of
the pre-exec hook, which should turn a potential hang to errors in
the pre-exec hook code instead.
Problem: The shell mustache render functionality only allows rendering
of templates for the current task and shell rank, but it may be useful
to use this functionality for an arbitrary task/node.

Add a new `struct mustache_arg` which is passed to the renderer.
For now, fill it in with the shell, current task, and current rank.
Problem: There is no way for shell plugins to check if a mustache
template would render to a different result on a different task or
rank, because flux_shell_mustache_render(3) only works for the current
rank and task.

Add new functions to render mustache templates for any shell rank
or local task.
Problem: The output plugin implements a "term" output type, but this
is no longer used since the shell doesn't support standalone mode.

Remove the shell output term type.
grondo added 5 commits January 7, 2025 01:21
Problem: The current form of the shell output plugin does not lend
itself to being easily extended for per-node or per-task file output.

Refactor the output plugin, with the following major changes:

 - Add a `struct output_stream` for captureing the configuration of
   a single output stream. Combine several stdout_*/stderr_* members
   from the main shell_output structure into single `stdout` and
   `stderr` members.
 - Move the configuration of streams from being sprinkled throughout
   the code to a single point at initialization time.
 - Move the output write service "client" code into output/client.[ch]
   since it is fairly compartmentalized
 - Add a file hash abstraction in output/filehash.[ch] as a better
   abstraction for opening the same file for multiple streams (and this
   will come in handy when supporting per-task output files.

The rest is just cleanup and reorg from the changes above.
Problem: KVS operations of the shell output plugin is mixed in
with other code in output.c, making the code more difficult to
follow than necessary.

Split the KVS output functionality and into output/kvs.[ch] and
under a new `struct kvs_output` abstraction.

Drop the `out->output` json array, which didn't actually ever
accumulate more than 1 entry (the eventlogger object accumulates
writes until the batch timeout).

Since output data doesn't have to be appended to out->output,
drop the unnecessary call to eventlog_entry_pack() and simplify
output write calls so data doesn't have to be unnecessarily
packed/unpacked into an eventlog entry object.

Simplify the open of any output files so it can go between
kvs_output_create() and kvs_output_flush(), to capture redirect
events in the first KVS commit of the output eventlog.

Note that one bug related to redirect events was fixed here.
Previously redirect events only captured the task ranks of the
first shell rank, now all ranks are captured.
Problem: Shell output configuration is mingled with other output
code in the output plugin.

Move output configuration into output/conf.[ch].

Move the `struct shell_output` definition to output/output.h for
shared use by other output plugin components.
Problem: The job shell output plugin does not support per-task or
per-node output.

Detect if output is per-node in config and use this to determine if
output files need to be opened up on ranks other than 0.

Move all task related output handling output/task.[ch]. Create a
per-task output abstraction `struct task_output` and open all output
files for each task so that per-task output files are supported when
the specified output template renders differently for each task.
Problem: Data written via the shell.output plugin topic is currently
always written to the same destination as the first local task.
This leads to data being written to the wrong file when there is a
file open per task.

Use task_output_list_write() to direct data to the correct task.
grondo added 5 commits January 7, 2025 02:08
Problem: The output shell.write service code is mixed in with the main
output plugin, but would better be abstracted into its own source file.

Move the output service code into output/service.[ch].

Export functions shell_output_write_entry() and shell_output_close()
for use by the output_service object to perform the actual write of
data to the selected output destination.

If there are no remote shells, then the `struct output_service`
object is still created, but is a stub.

Abstract refcounting and tracking of lost remote shells into the
service object, since this is not necessary when the the shell.write
service is unused. The output service now only tracks remote shells
for its reference count, since local tasks are alrady referenced by
the shell itself (and libsubprocess does not consider them complete
untill all output has been read).

Since local tasks are no longer directly refcounted, increment the
output plugin refcount directly in task.init, and drop the reference
in task.exit.
Problem: Log messages from the job shell do not end up in the expected
files when using separate output file per rank or per task.

Move the log handling functionality to output/log.[ch] for better
organization. Tweak the `shell.log` callback so it sends log data
to a local stderr file if one is open, regardless of whether it is
invoked on shell rank 0 or not.

Always initialize `shell.log` plugin callback even if stderr is not
a file now that the code does the right thing in all situations.
Problem: The main component of the shell output plugin isn't
in the output subdirectory.

Move it.
Problem: The top-level comment for the main shell output plugin is
out of date.

Update the comment to refer the reader to comments in the various
plugin components, which are more detailed and correct.
Problem: Current documentation specifies that `--output` and `--error`
templates cannot be node or task specific, but this is no longer
the case.

Update documentation of these options to indicate that output
templates may be node/task specific to open separate ouptut files
per node or task.
Problem: There are no tests for per-node/task output file redirection
using the equivalent mustache templates.

Add some tests to t2606-job-shell-output-redirection.t.
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 84.79087% with 120 lines in your changes missing coverage. Please review.

Project coverage is 83.75%. Comparing base (e6b108f) to head (00a337f).

Files with missing lines Patch % Lines
src/shell/output/task.c 86.95% 27 Missing ⚠️
src/shell/output/output.c 82.96% 23 Missing ⚠️
src/shell/output/client.c 67.18% 21 Missing ⚠️
src/shell/output/kvs.c 87.59% 17 Missing ⚠️
src/shell/output/service.c 80.95% 12 Missing ⚠️
src/shell/output/filehash.c 86.11% 10 Missing ⚠️
src/shell/output/conf.c 85.96% 8 Missing ⚠️
src/shell/output/log.c 93.93% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6539      +/-   ##
==========================================
+ Coverage   83.63%   83.75%   +0.12%     
==========================================
  Files         523      530       +7     
  Lines       88088    88155      +67     
==========================================
+ Hits        73669    73831     +162     
+ Misses      14419    14324      -95     
Files with missing lines Coverage Δ
src/common/libsubprocess/fork.c 76.29% <100.00%> (+0.72%) ⬆️
src/shell/shell.c 79.56% <100.00%> (+0.24%) ⬆️
src/shell/output/log.c 93.93% <93.93%> (ø)
src/shell/output/conf.c 85.96% <85.96%> (ø)
src/shell/output/filehash.c 86.11% <86.11%> (ø)
src/shell/output/service.c 80.95% <80.95%> (ø)
src/shell/output/kvs.c 87.59% <87.59%> (ø)
src/shell/output/client.c 67.18% <67.18%> (ø)
src/shell/output/output.c 82.96% <82.96%> (ø)
src/shell/output/task.c 86.95% <86.95%> (ø)

... and 4 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant