-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: shell: refactor output plugin and enable per-node/task output files #6539
Open
grondo
wants to merge
15
commits into
flux-framework:master
Choose a base branch
from
grondo:output-pertask
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem: If a pre-exec hook generates a lot of output and fills stdio buffers, then the subprocess child could block inside the pre-exec hook. Since the parent is waiting for the sync fd to close, this could cause a deadlock. Set the stdout and stderr fds to non-blocking for the duration of the pre-exec hook, which should turn a potential hang to errors in the pre-exec hook code instead.
Problem: The shell mustache render functionality only allows rendering of templates for the current task and shell rank, but it may be useful to use this functionality for an arbitrary task/node. Add a new `struct mustache_arg` which is passed to the renderer. For now, fill it in with the shell, current task, and current rank.
Problem: There is no way for shell plugins to check if a mustache template would render to a different result on a different task or rank, because flux_shell_mustache_render(3) only works for the current rank and task. Add new functions to render mustache templates for any shell rank or local task.
Problem: The output plugin implements a "term" output type, but this is no longer used since the shell doesn't support standalone mode. Remove the shell output term type.
Problem: The current form of the shell output plugin does not lend itself to being easily extended for per-node or per-task file output. Refactor the output plugin, with the following major changes: - Add a `struct output_stream` for captureing the configuration of a single output stream. Combine several stdout_*/stderr_* members from the main shell_output structure into single `stdout` and `stderr` members. - Move the configuration of streams from being sprinkled throughout the code to a single point at initialization time. - Move the output write service "client" code into output/client.[ch] since it is fairly compartmentalized - Add a file hash abstraction in output/filehash.[ch] as a better abstraction for opening the same file for multiple streams (and this will come in handy when supporting per-task output files. The rest is just cleanup and reorg from the changes above.
Problem: KVS operations of the shell output plugin is mixed in with other code in output.c, making the code more difficult to follow than necessary. Split the KVS output functionality and into output/kvs.[ch] and under a new `struct kvs_output` abstraction. Drop the `out->output` json array, which didn't actually ever accumulate more than 1 entry (the eventlogger object accumulates writes until the batch timeout). Since output data doesn't have to be appended to out->output, drop the unnecessary call to eventlog_entry_pack() and simplify output write calls so data doesn't have to be unnecessarily packed/unpacked into an eventlog entry object. Simplify the open of any output files so it can go between kvs_output_create() and kvs_output_flush(), to capture redirect events in the first KVS commit of the output eventlog. Note that one bug related to redirect events was fixed here. Previously redirect events only captured the task ranks of the first shell rank, now all ranks are captured.
Problem: Shell output configuration is mingled with other output code in the output plugin. Move output configuration into output/conf.[ch]. Move the `struct shell_output` definition to output/output.h for shared use by other output plugin components.
Problem: The job shell output plugin does not support per-task or per-node output. Detect if output is per-node in config and use this to determine if output files need to be opened up on ranks other than 0. Move all task related output handling output/task.[ch]. Create a per-task output abstraction `struct task_output` and open all output files for each task so that per-task output files are supported when the specified output template renders differently for each task.
Problem: Data written via the shell.output plugin topic is currently always written to the same destination as the first local task. This leads to data being written to the wrong file when there is a file open per task. Use task_output_list_write() to direct data to the correct task.
Problem: The output shell.write service code is mixed in with the main output plugin, but would better be abstracted into its own source file. Move the output service code into output/service.[ch]. Export functions shell_output_write_entry() and shell_output_close() for use by the output_service object to perform the actual write of data to the selected output destination. If there are no remote shells, then the `struct output_service` object is still created, but is a stub. Abstract refcounting and tracking of lost remote shells into the service object, since this is not necessary when the the shell.write service is unused. The output service now only tracks remote shells for its reference count, since local tasks are alrady referenced by the shell itself (and libsubprocess does not consider them complete untill all output has been read). Since local tasks are no longer directly refcounted, increment the output plugin refcount directly in task.init, and drop the reference in task.exit.
Problem: Log messages from the job shell do not end up in the expected files when using separate output file per rank or per task. Move the log handling functionality to output/log.[ch] for better organization. Tweak the `shell.log` callback so it sends log data to a local stderr file if one is open, regardless of whether it is invoked on shell rank 0 or not. Always initialize `shell.log` plugin callback even if stderr is not a file now that the code does the right thing in all situations.
Problem: The main component of the shell output plugin isn't in the output subdirectory. Move it.
Problem: The top-level comment for the main shell output plugin is out of date. Update the comment to refer the reader to comments in the various plugin components, which are more detailed and correct.
Problem: Current documentation specifies that `--output` and `--error` templates cannot be node or task specific, but this is no longer the case. Update documentation of these options to indicate that output templates may be node/task specific to open separate ouptut files per node or task.
Problem: There are no tests for per-node/task output file redirection using the equivalent mustache templates. Add some tests to t2606-job-shell-output-redirection.t.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6539 +/- ##
==========================================
+ Coverage 83.63% 83.75% +0.12%
==========================================
Files 523 530 +7
Lines 88088 88155 +67
==========================================
+ Hits 73669 73831 +162
+ Misses 14419 14324 -95
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR refactors the job shell output plugin with the intent of making per-node/task output files possible.
This PR is currently marked as a WIP because I think eventually the incremental refactor commits should be all squashed into a single "rewrite output plugin" commit, as I'm not sure the separate commits would offer that much in the git history. However, I left them here for now because it might be slightly easier to review the process (though hopefully it doesn't make a review more confusing)
The refactor splits the output plugin into components now located under
src/shell/output/
. The main source of new functionality in the "task output list" abstraction found insrc/shell/output/task.[ch]
, which allows each task to separately open an output file and write data to it. Per-shell and per-task output files are supported simply by specifying a node or task-specific mustache tag in the--output
or--error
template. Tasks then either write directly to a file if they've opened one, or to the shell output service if shell_rank > 0, or directly to the KVS eventlogger on rank0.The output plugin also has to handle log messages (see
shell/output/log.[ch]
andshell.output
plugin calls (currently only made by thepty
plugin). Log messages are written to the same destination as the local task index=0, whileshell.output
data is written to the appropriate task in the task output list.Other high-level changes (obvious when going through the commits)
term
(terminal) output mode is removedshell/output/conf.[ch]
output
array is removed (was no longer necessary)A couple other potential issues:
{{tmpdir}}
tag has to be treated as a special case now, though it is unlikely anyone would use that tag in their output template).