-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use workflow management library #101
Comments
There are three input arrows to DL1 datacheck subrun wise. The two on the left actually correspond to a single DL1 file, the one which comes out of lstchain_dl1ab, correct? (event though the DL1a data it contains is a mere copy of the DL1a data produced in the r0_ro_dl1 step). This is a bit confusing, because the output files of r0_to_dl1 are not processed by the data check. I think the sketch would be more clear by removing the central incoming arrow in the "DL1 datacheck subrun wise" box |
You may also consider the common workflow language that will integrate with several tools using SLURM. On their page I see Arvados, Toil and StreamFlow. |
@vuillaut any preference on which framework we should go for? I agree that this would be a good time for joining forces. |
I am currently beta-testing CWL. |
In order to properly handle the data workflow. There are several options: (Sci)Luigi, Snakemake, Airflow, etc.
In SciLuigi the workflow can be defined through classes with
requires
,run
andoutput
methods which makes possible to build pipelines. Besides, it integrates the SLURM scheduler.In the following flowchart, the desired data flow is defined. Subrun-wise jobs in orange. Run-wise jobs & files in light purple. Currently, to merge files on a run basis we have to check that all previous jobs were successfully completed. This dependency would naturally come with a workflow management system.
It is also possible to use SLURM dependency option (which is currently used in some parts of the code).
The text was updated successfully, but these errors were encountered: