Remove double process execution monitoring layers #21
Labels
feature/CWL
Issue related to CWL support
feature/job
Issues related to job execution, reporting and logging.
process/wps1
Issue related to WPS 1.x processes support
process/wps3
Issue related to WPS 3.x (REST-JSON) processes support
project/DACCS
Related to DACCS project (https://github.com/orgs/DACCS-Climate)
Context & Feature
At the moment, processes are executed via monitoring the WPS-1 endpoint provided by
pywps
with anowslib.wps.WebProcessingService
. Thispywps
also monitors the sub-job execution, creating unnecessary layers of job monitoring, harder debugging, and complicated logging.A single
celery
job monitor should be implemented, directly monitoring the real job execution of the process (cwltool
). Also, becausecelery
only monitors the status/result of the WPS Execution sent topywps
which resides under the same app as the API, the cwltool operations are actually executed by a thread-worker of the API (weaver manager) instead of a celery worker. This is simply not the desired behavior, cannot be scaled, and doesn't use full worker/queue capabilities.The full stack of process/monitor is as follows:
In the case of EMS-workflow, this whole stack is repeated for the underlying ADES called the same way for each step.
Considerations
We need to consider what to do about validation of WPS I/O. This was the reason the
pywps
layer was originally employed.cwltool
or remote WPS service called already do this kind of validation on their side by validating the received inputs against the package/process definition, we could simply report the invalid input received by them when the ADES/EMS attempts to execute it. We would only need to return this error from the job execution, but no preemptive validation of I/O would be done.The WPS-1/2 endpoint with
pywps
should be preserved for the sole purpose of providing back-compatibility with WPS-1/2 by redirecting the job submission just as the WPS-3 REST-API does, using following correspondance (see also #126):GET /processes
GET /processes/{id}
POST /processes/{id}/jobs
GET /processes/{id}/jobs/{id}
insteadWith
celery>=4.3
, the Task Result has result_extended option which allows to store additional metadata about received inputs/function-name/etc. This should be enabled to have even better tracking of executed/pending tasks.Helpful References
The text was updated successfully, but these errors were encountered: