-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rv1 attributes
section will be deprecated
#1108
Comments
I can't think why this would be required. In the hello handshake where only R is provided, the resources need to be re-allocated immediately so no queueing should be needed. Also, the resources are already selected at that point so the queue constraint would be ignored. |
I agree, but I attempted to remove the code that adds the queue "meta" attribute to jobs and this caused diff --git a/resource/traversers/dfu_impl_update.cpp b/resource/traversers/dfu_impl_update.cpp
index 2abb2a22..3c05e325 100644
--- a/resource/traversers/dfu_impl_update.cpp
+++ b/resource/traversers/dfu_impl_update.cpp
@@ -568,12 +568,6 @@ int dfu_impl_t::update (vtx_t root, std::shared_ptr<match_writers_t> &writers,
m_err_msg += __FUNCTION__;
m_err_msg += ": emit_tm returned -1.\n";
}
- if (jobmeta.is_queue_set ()) {
- if (writers->emit_attrs ("queue", jobmeta.get_queue ()) == -1) {
- m_err_msg += __FUNCTION__;
- m_err_msg += ": emit_attrs returned -1.\n";
- }
- }
}
return (rc > 0)? 0 : -1;
@@ -632,12 +626,6 @@ int dfu_impl_t::update (vtx_t root, std::shared_ptr<match_writers_t> &writers,
m_err_msg += __FUNCTION__;
m_err_msg += ": emit_tm returned -1.\n";
}
- if (jobmeta.is_queue_set ()) {
- if (writers->emit_attrs ("queue", jobmeta.get_queue ()) == -1) {
- m_err_msg += __FUNCTION__;
- m_err_msg += ": emit_attrs returned -1.\n";
- }
- }
}
return (rc > 0)? 0: -1; |
Problem: when the fluxion modules are reloaded with running jobs, the jobs are killed if the match format is set to "rv1_nosched" and queues are enabled. hello_cb(), which informs the scheduler of jobs that are still holding resources from before the reload, looks for a special key in R (attributes.system.scheduler.queue). If set, it calls queue->reconstruct() on the named queue. If not set, it calls queue->reconstruct() on the the default queue. This key is not being set by fluxion, at least when "rv1_nosched" is in use. Since there has been a proposal to deprecate that key anyway (flux-framework#1108), and since the queue must only be utilized as a container for the job since its resource request has already been fulfilled, put the job in a random queue - the first one returned by the ctx->queues iterator. Note that there should always be at least one queue in the map. If no named queues are defined, a default one is instantiated.
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS.
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
Problem: when the fluxion modules are reloaded with running jobs, and the match format is "rv1_nosched", and queues are enabled, running jobs are killed with a fatal scheduler-restart exception. During the hello handshake defined by RFC 27, the scheduler is informed during its initialization of jobs that are holding resources. The hello callback in qmanager retrieves the job's queue name from the R key "attributes.system.scheduler.queue". The queue name is used to locate the proper queue for the job to be inserted into. This attribute is not being set in R when the match format is "rv1_nosched" so the default queue is assumed. Since the default queue is not instantiated when named queues are defined, a fatal job exception is raised when the queue lookup fails. There has been a proposal to deprecate the R attribute (flux-framework#1108), so rather than ensure that it is set in this case, determine the queue instead by fetching the jobspec from the KVS. Fixes flux-framework#1108
flux-framework/rfc#402 proposes to remove the optional
attributes
section from Rv1. Fluxion currently uses this section to store the queue inattributes.system.scheduler.queue
, though it isn't clear why this is needed since this duplicates the queue set in current jobspec.Additionally, the
attributes
section does not appear to be set on our production systems with configured queues. Perhaps this is a holdover from before flux-core supported queues and Fluxion had its own queue implementation?In any event, if stashing the queue in R is still necessary, there is already the opaque
scheduling
key provided for the scheduler's use which is perhaps better used for this purpose.The text was updated successfully, but these errors were encountered: