Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove double process execution monitoring layers #21

Open
fmigneault opened this issue Mar 15, 2019 · 0 comments
Open

Remove double process execution monitoring layers #21

fmigneault opened this issue Mar 15, 2019 · 0 comments
Assignees
Labels
feature/CWL Issue related to CWL support feature/job Issues related to job execution, reporting and logging. process/wps1 Issue related to WPS 1.x processes support process/wps3 Issue related to WPS 3.x (REST-JSON) processes support project/DACCS Related to DACCS project (https://github.com/orgs/DACCS-Climate)

Comments

@fmigneault
Copy link
Collaborator

fmigneault commented Mar 15, 2019

Context & Feature

At the moment, processes are executed via monitoring the WPS-1 endpoint provided by pywps with an owslib.wps.WebProcessingService. This pywps also monitors the sub-job execution, creating unnecessary layers of job monitoring, harder debugging, and complicated logging.

A single celery job monitor should be implemented, directly monitoring the real job execution of the process (cwltool). Also, because celery only monitors the status/result of the WPS Execution sent to pywps which resides under the same app as the API, the cwltool operations are actually executed by a thread-worker of the API (weaver manager) instead of a celery worker. This is simply not the desired behavior, cannot be scaled, and doesn't use full worker/queue capabilities.

The full stack of process/monitor is as follows:

  1. [EMS] POST Weaver /job, submits job to db
  2. Celery worker picks job and calls WPSExecute on PyWPS and monitors status until success/failure
  3. PyWPS receives the job (Weaver manager API side) and creates its interal workers to run it
  4. PyWPS handler runs job (cwltool or remote wps)

In the case of EMS-workflow, this whole stack is repeated for the underlying ADES called the same way for each step.

Considerations

We need to consider what to do about validation of WPS I/O. This was the reason the pywps layer was originally employed.

  • Since executed cwltool or remote WPS service called already do this kind of validation on their side by validating the received inputs against the package/process definition, we could simply report the invalid input received by them when the ADES/EMS attempts to execute it. We would only need to return this error from the job execution, but no preemptive validation of I/O would be done.
  • other approach???

The WPS-1/2 endpoint with pywps should be preserved for the sole purpose of providing back-compatibility with WPS-1/2 by redirecting the job submission just as the WPS-3 REST-API does, using following correspondance (see also #126):

  • GetCapabilities => GET /processes
  • DescribeProcess => GET /processes/{id}
  • Job Execute => POST /processes/{id}/jobs
  • Job GetStatus (n/a) use GET /processes/{id}/jobs/{id} instead

With celery>=4.3, the Task Result has result_extended option which allows to store additional metadata about received inputs/function-name/etc. This should be enabled to have even better tracking of executed/pending tasks.

Helpful References

@fmigneault fmigneault added process/wps1 Issue related to WPS 1.x processes support triage/feature New requested feature. process/wps3 Issue related to WPS 3.x (REST-JSON) processes support labels Mar 15, 2019
@fmigneault fmigneault added the project/DACCS Related to DACCS project (https://github.com/orgs/DACCS-Climate) label May 11, 2020
@github-actions github-actions bot added triage/bug Something isn't working and removed triage/feature New requested feature. labels May 20, 2020
@github-actions github-actions bot added the feature/CWL Issue related to CWL support label May 20, 2020
@fmigneault fmigneault removed the triage/bug Something isn't working label May 20, 2020
@github-actions github-actions bot added the triage/bug Something isn't working label May 21, 2020
@fmigneault fmigneault added feature/job Issues related to job execution, reporting and logging. and removed triage/bug Something isn't working labels May 21, 2020
fmigneault added a commit that referenced this issue Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/CWL Issue related to CWL support feature/job Issues related to job execution, reporting and logging. process/wps1 Issue related to WPS 1.x processes support process/wps3 Issue related to WPS 3.x (REST-JSON) processes support project/DACCS Related to DACCS project (https://github.com/orgs/DACCS-Climate)
Projects
None yet
Development

No branches or pull requests

2 participants