Implement rv1exec reader update to facilitate Fluxion reload #1176

milroy · 2024-04-15T08:52:34Z

This PR adds support for update () with rv1exec/rv1_nosched format. Previously, only JGF was implement and supported for Fluxion reload. Since RV1 without the scheduling key doesn't contain information about edges, exclusivity, subsystems, sizes, etc., assumptions need to be made about the graph structure. The only supported subsystem is currently containment. Ranks are assumed to exist at the node level, and node-to-child edges are assumed to be marked exclusive. Applying the reader to a resource graph created from another format (e.g., JGF) may result in undesirable behavior due to the mismatch between graph granularities.

Handling ranks in the unit tests is different than other resource-query-based tests. That is due to the fact that resource-query simulates the behavior of the scheduler independent of the resource manager and execution system. To enable sharness tests with resource-query, the files need to explicity map ranks >=0 to resources.

Note that the implementation in this PR does not update the aggregate or exclusivity filters. That functionality is performed by the traverser here:

flux-sched/resource/traversers/dfu_impl_update.cpp

Lines 613 to 644 in cb95173

    
               if (m_graph_db->metadata.v_rt_edges[dom].get_trav_token () 
        
                   != m_best_k_cnt) { 
        
                   // This condition occurs when the subgraph came from a 
        
                   // traverver different from this traverser, for example, 
        
                   // a traverser whose dominant subsystem is different than this. 
        
                   return 0; 
        
               } 
        
               excl = m_graph_db->metadata.v_rt_edges[dom].get_exclusive (); 
        
               x = (excl == 0)? false : true; 
        
               needs = static_cast<unsigned int>(m_graph_db->metadata 
        
                                                     .v_rt_edges[dom].get_needs ()); 
        
               m_color.reset (); 
        
               bool emit_shadow = modify_traversal (root, false); 
        
               if ( (rc = upd_dfv (root, writers, needs, 
        
                                   x, jobmeta, false, dfu, emit_shadow)) > 0) { 
        
                    uint64_t starttime = jobmeta.at; 
        
                    uint64_t endtime = jobmeta.at + jobmeta.duration; 
        
                    if (writers->emit_tm (starttime, endtime) == -1) { 
        
                        m_err_msg += __FUNCTION__; 
        
                        m_err_msg += ": emit_tm returned -1.\n"; 
        
                    } 
        
                    if (jobmeta.is_queue_set ()) { 
        
                        if (writers->emit_attrs ("queue", jobmeta.get_queue ()) == -1) { 
        
                            m_err_msg += __FUNCTION__; 
        
                            m_err_msg += ": emit_attrs returned -1.\n"; 
        
                        } 
        
                    } 
        
               } 
        
               return (rc > 0)? 0: -1; 
        
           }

~~Note that there is one TODOs:~~

calling the find_vertex () function when creating vertices results in errors in existing unit tests. I'm reasonably sure this is due to the fact that only one rank exists (-1) for the unit tests, but I need to double-check.
may need more test cases; reviewer feedback is welcome here

Resolves issue #991.

milroy · 2024-04-15T09:01:16Z

@grondo or @garlick I don't think the tests I've added are sufficient to test the desired reload behavior. I'm going to devise more, but I'd like your input if you have suggestions.

garlick · 2024-04-15T15:45:13Z

Fantastic! I'm going to do a little manual testing and see if I can come up with some ideas for automated testing.

So, the expected shortcoming here is an node-exclusive allocation may not remain exclusive after a scheduler reload? Does that matter? My understand is if you ask for say one core of a node with 32 cores but set the exclusive flag (or if the scheduler is configured for node exclusivity), you get all 32 cores. So on reload, the scheduler would just see R with the whole node allocated and it shouldn't matter if the node was marked exclusive or not because the "unused" cores are not really unused as far as the scheduler is concerned.

garlick · 2024-04-15T16:53:34Z

@grondo and I both gave this a quick sanity check where we submitted a sleep inf job then reloaded the fluxion modules. We found that it works for a simple config (yay!)

sched-fluxion-qmanager.debug[0]: requeue success (queue=default id=378547973585371136)

but not when queues are configured:

jobmanager_hello_cb: ENOENT: map::at: No such file or directory

grondo · 2024-04-15T16:56:12Z

This same error was noted with rv1 earlier in this comment. Possibly I should have opened a separate issue on it, sorry!

garlick · 2024-04-15T17:22:18Z

Hmm, my job that failed with queues configured did not set attributes.system.scheduler.queue in R. That appears to be necessary for fluxion to find the right queue:

https://github.com/flux-framework/flux-sched/blob/master/qmanager/modules/qmanager_callbacks.cpp#L159

garlick · 2024-04-15T17:24:20Z

See also:

I wonder why it is important to put the job into the correct queue after it's already been allocated resources? Assuming that's just the container that is available, what if we just always put these jobs in the default queue?

milroy · 2024-04-15T18:32:47Z

So, the expected shortcoming here is an node-exclusive allocation may not remain exclusive after a scheduler reload?

Actually, this PR should handle that case. I am concerned that there are some edge cases where the exclusivity flags per edge aren't set correctly, and then the traversal here:

flux-sched/resource/traversers/dfu_impl_update.cpp

Line 627 in cb95173

if ( (rc = upd_dfv (root, writers, needs,

doesn't set the filters correctly. The implication is probably just loss of traversal optimization rather than a problem with the allocation.

milroy · 2024-04-15T19:40:35Z

I wonder why it is important to put the job into the correct queue after it's already been allocated resources? Assuming that's just the container that is available, what if we just always put these jobs in the default queue?

That makes sense to me. Should I add logic to this PR to insert the job in the default queue regardless of whatever attributes.system.scheduler.queue says since it's being deprecated?

garlick · 2024-04-15T19:45:17Z

I guess if it makes sense to you! Since I guess the default queue name isn't defined when named queues are being used I just did this and it seems to work.

diff --git a/qmanager/modules/qmanager_callbacks.cpp b/qmanager/modules/qmanager_callbacks.cpp
index ae8b68f7..205e1101 100644
--- a/qmanager/modules/qmanager_callbacks.cpp
+++ b/qmanager/modules/qmanager_callbacks.cpp
@@ -125,10 +125,7 @@ int qmanager_cb_t::jobmanager_hello_cb (flux_t *h, const flux_msg_t *msg,
 
 {
     int rc = 0;
-    json_t *o = NULL;
-    json_error_t err;
     std::string R_out;
-    char *qn_attr = NULL;
     std::string queue_name;
     std::shared_ptr<queue_policy_base_t> queue;
     std::shared_ptr<job_t> running_job = nullptr;
@@ -148,28 +145,8 @@ int qmanager_cb_t::jobmanager_hello_cb (flux_t *h, const flux_msg_t *msg,
         goto out;
     }
 
-    if ( (o = json_loads (R, 0, &err)) == NULL) {
-        rc = -1;
-        errno = EPROTO;
-        flux_log (h, LOG_ERR, "%s: parsing R for job (id=%jd): %s %s@%d:%d",
-                  __FUNCTION__, static_cast<intmax_t> (id),
-                  err.text, err.source, err.line, err.column);
-        goto out;
-    }
-    if ( (rc = json_unpack (o, "{ s?:{s?:{s?:{s?:s}}} }",
-                                   "attributes",
-                                       "system",
-                                           "scheduler",
-                                                "queue", &qn_attr)) < 0) {
-        json_decref (o);
-        errno = EPROTO;
-        flux_log (h, LOG_ERR, "%s: json_unpack for attributes", __FUNCTION__);
-        goto out;
-    }
-
-    queue_name = qn_attr? qn_attr : ctx->opts.get_opt ().get_default_queue_name ();
-    json_decref (o);
-    queue = ctx->queues.at (queue_name);
+    queue_name = ctx->queues.begin()->first;
+    queue = ctx->queues.begin()->second;
     running_job = std::make_shared<job_t> (job_state_kind_t::RUNNING,
                                                    id, uid, calc_priority (prio),
                                                    ts, R);

garlick · 2024-04-15T21:52:39Z

I'll see if I can work up a test script that covers the basics here. The tests that are failing (before the patch I posted above) are

17/92 Test #17: t1009-recovery-multiqueue.t ...........***Failed   20.74 sec
[snip]
not ok 6 - recovery: works when both modules restart (rv1)
#	
#		flux dmesg -C &&
#		remove_qmanager &&
#		reload_resource match-format=rv1 policy=high &&
#		load_qmanager_sync &&
#		check_requeue ${jobid1} batch &&
#		check_requeue ${jobid2} debug &&
#		test_must_fail flux job wait-event -t 0.5 ${jobid3} start &&
#		test_expect_code 3 flux ion-resource info ${jobid3}
#	

expecting success: 
	flux cancel ${jobid2} &&
	flux job wait-event -t 10 ${jobid4} start

flux-job: wait-event timeout on event 'start'
not ok 7 - recovery: a cancel leads to a job schedule (rv1)
#	
#		flux cancel ${jobid2} &&
#		flux job wait-event -t 10 ${jobid4} start
#	

expecting success: 
	flux cancel ${jobid1} &&
	flux cancel ${jobid3} &&
	flux cancel ${jobid4} &&
	flux job wait-event -t 10 ${jobid4} release

1713212995.938612 release ranks="all" final=true
[snip]
# failed 2 among 10 test(s)

milroy · 2024-04-15T23:24:49Z

I'll see if I can work up a test script that covers the basics here. The tests that are failing (before the patch I posted above) are

I'm a bit confused. Those tests pass for me before applying the patch above and fail after applying the patch.

garlick · 2024-04-15T23:31:08Z

Oh, hmm, maybe I wasn't testing the right thing. Will double check.

garlick · 2024-04-15T23:39:53Z

I reteseted with your branch and that test is not failing. Sorry about that!

garlick · 2024-04-15T23:45:53Z

@milroy - I'm out of time for today but I did push a new test script to my rv1-update-jim branch.

b116c9a testsuite: add test for reloading with rv1_nosched
88e469a qmanager: fix module reload with rv1, queues, jobs

I guess that test breakage is mine :-(

milroy · 2024-04-15T23:47:12Z

@milroy - I'm out of time for today but I did push a new test script to my rv1-update-jim branch.

Thanks for the help @garlick!

milroy · 2024-04-16T00:29:49Z

jobmanager_hello_cb: ENOENT: map::at: No such file or directory

This problem appears to be due to the default queue name ("default") key not existing in the ctx->queues map. Apparently it never gets added.

milroy · 2024-04-16T01:24:42Z

It looks like the default name getter is returning garbage:

    queue_it = ctx->queues.find (ctx->opts.get_opt ().get_default_queue_name ());
    if (queue_it != ctx->queues.end ()) {
        queue = queue_it->second;
        queue_name = queue_it->first;
    } else {
        flux_log (h, LOG_ERR, "%s: queue %s not recognized; default is %s ",
                  __FUNCTION__, queue_name,
                  ctx->opts.get_opt ().get_default_queue_name ());
        rc = -1;
        goto out;
    }

And the log output:

Apr 16 01:17:52.765851 sched-fluxion-qmanager.err[0]: jobmanager_hello_cb: queue ???L?? not recognized; default is ??L??

The default is set here:

flux-sched/qmanager/modules/qmanager_opts.hpp

Line 137 in cb95173

std::string m_default_queue_name = "default";

and there doesn't appear to be a way to update the value. The opt getter appears to be corrupting the value somehow:

flux-sched/src/common/liboptmgr/optmgr.hpp

Line 71 in cb95173

const T &get_opt () const {

garlick · 2024-04-16T14:35:10Z

In case it helps, here is some background on the default queue - it is confusing in this context because queues in fluxion predate queues in flux-core. (I hope my fluxion assertions are correct as I'm making them from memory)

In flux-core, when no named queues are configured, there is a single anonymous queue. The default queue is instantiated in fluxion when no named queues are configured in flux core.

When named queues are configured in flux-core, there is no anonymous queue, and only the named queues are instantiated in fluxion. The default queue is undefined.

To make matters more confusing, flux-core has a concept of a default queue. However in this context, the default queue is a jobspec modification at job ingest time. If a default queue is not configured then jobs that don't specify a queue when named queues are configured are rejected at ingest.

garlick · 2024-04-16T15:27:18Z

The tests that are failing after my patch was added may indicate a problem with the approach of placing allocated jobs in a random (first) queue, particularly this test, where a job in the same queue as a job that was "recovered" is supposed to be scheduled when the recovered job is canceled:

not ok 7 - recovery: a cancel leads to a job schedule (rv1)
#	
#		flux cancel ${jobid2} &&
#		flux job wait-event -t 10 ${jobid4} start
#

Possibly the problem is that a queue that has reached a point where nothing in it can be scheduled is not revisited until the allocated jobs are removed from it. Thus placing a job in a random queue on restart may prevent that queue from making progress when the job is freed.

This comment in queue_policy_base.hpp bolsters this somewhat

  /*! Return true if this queue has become schedulable since
     *  its state had been reset with set_schedulability (false).
     *  "Being schedulable" means one or more job or resource events
     *  have occurred such a way that the scheduler should run the
     *  scheduling loop for the pending jobs: e.g., a new job was
     *  inserted into the pending job queue or a job was removed from
     *  the running job queue so that its resource was released.
     */
    bool is_schedulable ();

garlick · 2024-04-16T15:38:44Z

Since "hello" is not on a critical performance path, maybe the simplest thing would be to look up the jobspec to obtain the queue name. I'll experiment with that.

garlick · 2024-04-16T17:29:16Z

BTW I think the garbage queue name from flux_log() may be due ot the fact that it is a std:string and needs the c_str() method called on it before being passed to flux_log().

garlick · 2024-04-16T17:53:31Z

I think I've got a more workable solution where the queue name is picked up from the jobspec, however, my new test is failing in interesting ways that I need to understand. I'll post an update soon once I've figured that out.

garlick · 2024-04-16T21:24:27Z

OK, those tests are passing now. Feel free to cherry pick these commits from my branch (instead of the originals)

caec1a4 - qmanager: look up job queue name in KVS

2e2ae23 - testsuite: add test for reloading with rv1_nosched

garlick · 2024-04-16T23:11:50Z

I just realized several error paths lead to success in that hello callback, which could cause double booking of resources. So I've got a couple more commits:

bfc1617 qmanager: fix error handling during hello

7f11e19 manager: improve unknown queue hello error

The first one at least ought to be included in this PR since it's somewhat serious.

milroy · 2024-04-17T01:14:39Z

BTW I think the garbage queue name from flux_log() may be due ot the fact that it is a std:string and needs the c_str() method called on it before being passed to flux_log().

You're right. I should have seen that.

milroy · 2024-04-17T01:41:15Z

In case it helps, here is some background on the default queue - it is confusing in this context because queues in fluxion predate queues in flux-core. (I hope my fluxion assertions are correct as I'm making them from memory)

The background does help- thanks for the additional details. Now I understand what was going on with the missing "default" queue.

garlick · 2024-04-17T01:48:42Z

NP! Did you want me to push those commits to your branch and then you can review them a little more conveniently?

milroy · 2024-04-17T01:52:35Z

I just realized several error paths lead to success in that hello callback, which could cause double booking of resources.

Yikes, thanks for catching this! For the sake of clarity, prior to this PR there's one error path that can lead to a spurious success:

flux-sched/qmanager/modules/qmanager_callbacks.cpp

Lines 141 to 148 in 8895bea

    
           if (flux_msg_unpack (msg, 
        
                                "{s:I s:i s:i s:f}", 
        
                                "id", &id, 
        
                                "priority", &prio, 
        
                                "userid", &uid, 
        
                                "t_submit", &ts) < 0) { 
        
               flux_log_error (h, "%s: flux_msg_unpack", __FUNCTION__); 
        
               goto out;

garlick

The part of this in qmanager that I understand looks good. Could we have another sched developer review the resource part?

I did install this on my test cluster and verify that I can now reload the scheduler with running jobs and the resource counts look good (both in core and in sched by setting FLUX_RESOURCE_LIST_RPC=sched-fluxion-resource.

Great work @milroy !

trws · 2024-04-22T16:57:14Z

Reviewing now, hopefully I can get through a pass before I get to the airport.

trws · 2024-04-22T17:06:44Z

qmanager/modules/qmanager_callbacks.cpp

+            goto out;
+        }
+    }
+    if (json_unpack (jobspec,


We really need a json path wrapper for things like this. Definitely not in this PR but mentioning it to remind myself.

qmanager/modules/qmanager_callbacks.cpp

trws

This is great. There are a couple of style or performance issues I noted, but as soon as those get tweaked I'm all good with this. Awesome work tracking all this down @milroy.

@garlick, @grondo I'm almost to the airport to hop a plane to Oregon, in case I don't have internet, feel free to dismiss this and approve without waiting for me to take another pass.

resource/readers/resource_reader_rv1exec.hpp

trws · 2024-04-22T17:13:30Z

resource/readers/resource_reader_rv1exec.cpp

+
+    // Search graph metadata for vertex
+    auto vtx_iter = m.by_path.find (path);
+    if (vtx_iter != m.by_path.end ()) {


This is nitpicky, but as a style note, try to have checks like this with a long span use early exits rather than wrap the whole span. For example, if (vtx_iter == m.by_path.end()) return vtx;.

Done. I changes a couple other places where this comment applied as well.

resource/readers/resource_reader_rv1exec.cpp

milroy · 2024-04-23T02:00:37Z

Thanks for the review and feedback @trws! It may be worth taking a look at the changes I made to address your comment on early exits.

milroy · 2024-04-23T04:10:42Z

I updated the early exit code sections to be more aligned with how Fluxion currently handles conditional exits. For example:

    const auto &vtx_iter = m.by_path.find (path);
    // Not found; return null_vertex
    if (vtx_iter == m.by_path.end ())
        return null_vtx;
    // Found in by_path
    for (vtx_t v : vtx_iter->second) {
        if (g[v].rank == rank
                && g[v].id == id
                && g[v].size == size
                && g[v].type == type) {
            return v;

Versus:

    const auto &vtx_iter = m.by_path.find (path);
    // Not found; return null_vertex
    if (vtx_iter == m.by_path.end ()) {
        return null_vtx;
    } else {
        // Found in by_path
        for (vtx_t v : vtx_iter->second) {
            if (g[v].rank == rank
            [. . .]

Do you have a preference between these?

trws · 2024-04-23T14:22:56Z

Do you have a preference between these?

The former. It's a style I'm pretty sure I learned from @garlick and @grondo in flux-core. We have a lot of code that has a lot of potential error propagation conditions, if we do the second style then every time we check one we have to increase a nesting level, where the former leaves everything flat. To me it makes the code much more readable, because you can easily tell the difference between a condition that's an early exit to propagate an error or unknown condition and code where there are actually two different code-paths during normal execution.

Avoiding nesting isn't a hard and fast rule, but this case is about as close as it gets for me. If the then branch of the condition ends in either a return or a goto cleanup or similar, it shouldn't have an else because the execution is identical whether it nests or not.

trws

It looks really good, thanks @milroy. Added one comment just on factoring, but it's entirely optional.

resource/readers/resource_reader_rv1exec.cpp

milroy · 2024-04-23T17:11:40Z

To me it makes the code much more readable, because you can easily tell the difference between a condition that's an early exit to propagate an error or unknown condition and code where there are actually two different code-paths during normal execution.

That makes sense and I agree, but I thought I'd check.

Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.

Problem: the sched-fluxion-resource.update RPC only supports JGF. Add a check for an empty scheduling key and fall back to rv1exec.

Problem: the `resource-query` tool currently only supports update via JGF. Add support for rv1exec, including additional options handling. This enables sharness tests for reload and updates under rv1exec.

Problem: the current tests for reloading Fluxion with rv1exec fail due to lack of update support. Now that the support is added, we need tests for correct behavior. Add new tests to check for ability to reload Fluxion and update a resource graph with rv1exec. Update existing tests to use the additional option required by resource-query for update.

Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108

Problem: a few specific use cases related to reloading fluxion when rv1_nosched is set are not fully covered by existing tests. Check the following - jobs are not killed when modules are reloaded (no queues configured) - jobs are not killed when modules are reloaded (queues configured) - no resources are double booked in those cases - jobs submitted to the anon queue are killed when the scheduler is reloaded with queues configured - jobs submitted to a named queue are killed when the scheduler is reloaded without queues configured

Problem: several error paths in jobmanager_hello_cb() do not result in the function returning a failure. I think the only ones that do trigger an error are a C++ exception (due to the wrapper) and if (queue->reconstruct() returns -1. The callback returns 0 (success) in other cases which could result in double booked resources.

Problem: when a hello response is received for a job with an unkown queue, the error is not super helpful. Instead of: jobmanager_hello_cb: ENOENT: map::at: No such file or directory make it: jobmanager_hello_cb: unknown queue name (id=41137733632 queue=default) This can occur when queues are reconfigured and the scheduler is reloaded, so it's nice to have the job id and the offending queue name.

trws · 2024-04-24T16:20:13Z

Bounced the go test runner, looks like it failed to fetch the image for some reason. Feel free to set MWP @milroy, great stuff!

milroy · 2024-04-24T17:20:32Z

Thanks for the detailed and helpful feedback @trws and @garlick! Setting MWP.

milroy linked an issue Apr 15, 2024 that may be closed by this pull request

Fluxion can't restart running jobs with match-format rv1_nosched #991

Closed

milroy force-pushed the rv1-update branch from a025bcb to 6654888 Compare April 17, 2024 02:07

milroy force-pushed the rv1-update branch from 774f4c6 to 24255b3 Compare April 21, 2024 06:15

garlick approved these changes Apr 21, 2024

View reviewed changes

trws reviewed Apr 22, 2024

View reviewed changes

qmanager/modules/qmanager_callbacks.cpp Show resolved Hide resolved

trws requested changes Apr 22, 2024

View reviewed changes

milroy force-pushed the rv1-update branch from 24255b3 to 9733e1a Compare April 23, 2024 01:26

milroy force-pushed the rv1-update branch from 9733e1a to e03fadb Compare April 23, 2024 04:13

trws approved these changes Apr 23, 2024

View reviewed changes

resource/readers/resource_reader_rv1exec.cpp Outdated Show resolved Hide resolved

milroy force-pushed the rv1-update branch 3 times, most recently from 72ef733 to 1cae444 Compare April 24, 2024 01:03

milroy and others added 8 commits April 23, 2024 19:27

resource module: add support for update with rv1exec

a758816

Problem: the sched-fluxion-resource.update RPC only supports JGF. Add a check for an empty scheduling key and fall back to rv1exec.

resource-query: add support for rv1exec update

7b647a3

Problem: the `resource-query` tool currently only supports update via JGF. Add support for rv1exec, including additional options handling. This enables sharness tests for reload and updates under rv1exec.

milroy force-pushed the rv1-update branch from 1cae444 to 5fc9272 Compare April 24, 2024 02:27

milroy added the merge-when-passing mergify.io - merge PR automatically once CI passes label Apr 24, 2024

mergify bot merged commit 143020d into flux-framework:master Apr 24, 2024
21 checks passed

milroy deleted the rv1-update branch April 24, 2024 17:21

	if (m_graph_db->metadata.v_rt_edges[dom].get_trav_token ()
	!= m_best_k_cnt) {
	// This condition occurs when the subgraph came from a
	// traverver different from this traverser, for example,
	// a traverser whose dominant subsystem is different than this.
	return 0;
	}

	excl = m_graph_db->metadata.v_rt_edges[dom].get_exclusive ();
	x = (excl == 0)? false : true;
	needs = static_cast<unsigned int>(m_graph_db->metadata
	.v_rt_edges[dom].get_needs ());
	m_color.reset ();
	bool emit_shadow = modify_traversal (root, false);
	if ( (rc = upd_dfv (root, writers, needs,
	x, jobmeta, false, dfu, emit_shadow)) > 0) {
	uint64_t starttime = jobmeta.at;
	uint64_t endtime = jobmeta.at + jobmeta.duration;
	if (writers->emit_tm (starttime, endtime) == -1) {
	m_err_msg += __FUNCTION__;
	m_err_msg += ": emit_tm returned -1.\n";
	}
	if (jobmeta.is_queue_set ()) {
	if (writers->emit_attrs ("queue", jobmeta.get_queue ()) == -1) {
	m_err_msg += __FUNCTION__;
	m_err_msg += ": emit_attrs returned -1.\n";
	}
	}
	}

	return (rc > 0)? 0: -1;
	}

Implement rv1exec reader update to facilitate Fluxion reload #1176

Implement rv1exec reader update to facilitate Fluxion reload #1176

Conversation

milroy commented Apr 15, 2024 • edited Loading

milroy commented Apr 15, 2024

garlick commented Apr 15, 2024

garlick commented Apr 15, 2024

grondo commented Apr 15, 2024

garlick commented Apr 15, 2024

garlick commented Apr 15, 2024 • edited Loading

milroy commented Apr 15, 2024

milroy commented Apr 15, 2024

garlick commented Apr 15, 2024

garlick commented Apr 15, 2024

milroy commented Apr 15, 2024

garlick commented Apr 15, 2024

garlick commented Apr 15, 2024

garlick commented Apr 15, 2024

milroy commented Apr 15, 2024

milroy commented Apr 16, 2024

milroy commented Apr 16, 2024 • edited Loading

garlick commented Apr 16, 2024 • edited Loading

garlick commented Apr 16, 2024

garlick commented Apr 16, 2024

garlick commented Apr 16, 2024 • edited Loading

garlick commented Apr 16, 2024

garlick commented Apr 16, 2024

garlick commented Apr 16, 2024

milroy commented Apr 17, 2024

milroy commented Apr 17, 2024

garlick commented Apr 17, 2024

milroy commented Apr 17, 2024

garlick left a comment

Choose a reason for hiding this comment

trws commented Apr 22, 2024

trws Apr 22, 2024

Choose a reason for hiding this comment

trws left a comment

Choose a reason for hiding this comment

trws Apr 22, 2024

Choose a reason for hiding this comment

milroy Apr 23, 2024

Choose a reason for hiding this comment

milroy commented Apr 23, 2024

milroy commented Apr 23, 2024

trws commented Apr 23, 2024

trws left a comment

Choose a reason for hiding this comment

milroy commented Apr 23, 2024

trws commented Apr 24, 2024

milroy commented Apr 24, 2024

milroy commented Apr 15, 2024 •

edited

Loading

garlick commented Apr 15, 2024 •

edited

Loading

milroy commented Apr 16, 2024 •

edited

Loading

garlick commented Apr 16, 2024 •

edited

Loading

garlick commented Apr 16, 2024 •

edited

Loading