-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set save_intermediate_files
to false by default
#52
Merged
Faizal-Eeman
merged 3 commits into
main
from
mmootor-save_intermediate_files-default-to-false
Jul 21, 2022
+2
−2
Merged
Changes from 1 commit
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need this argument - it goes to
pipeline-call-sSV/pipeline/config/methods.config
Line 187 in 4da6d8b
But we also have L37
cache=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point Taka!
As I understand from NextFlow documentation on process cache, this allows users to
resume
the pipeline in case a process fails and the process needs to continue after fixing the failure. By defaultcache = true
.But since we are also specifying the
process.cache
inmethods.config
by usingcache_intermediate_pipeline_steps = false
fromdefault.config
, shouldn't this technically introduce conflict as we are having two boolean values for the same process cache argument? One from L37cache_intermediate_pipeline_steps=false
and another from L37cache=true
? Unless thecache_intermediate_pipeline_steps=false
over-rides thecache=true
argument and we are setting the pipeline to not cache any process files.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far the pipeline runs smoothly and when the pipeline fails because of an error, I haven't seen in the logs Nextflow's recommendation to resume the pipeline after fixing the error. So this means,
process.cache
is taken asfalse
. I can commentcache=true
and try if the pipeline runs and delivers results as expected.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, for common variables like
process.cache
, there's a hierarchy of values, where for example the process-specific settings override the generalprocess
-level settings. In this case though, since it's the same directive in the same scope, the value just overwrite and the latest setting is the one that's kept.Also, as a note, the
resume
option wouldn't work with the submission script since the working directory gets deleted upon node shutdown. It should work if testing interactively though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood.
@yashpatel6 So
cache_intermediate_pipeline_steps=false
overwritescache=true
? Then should we just leave these arguments as such or would you recommend to dropcache=true
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. We should post this to the NF WG repo GH discussions with some links (maybe I didn't google enough). So, that means we can just do
cache = false
as default for most cases, which might improve run time, etc for large samples like call-gSNP + large WGS N/T samples?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @yashpatel6, very helpful!
@tyamaguchi-ucla So can we hold what to do with cache settings for now? and perhaps make changes after call-sSV 3.0.0 release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, let's discuss this at the NF WG. We can finish up #46 and then release a new version. We generally want to discuss changes that require a template update. Also, it might be helpful for you to understand how we developed the current config structure. @yashpatel6 maybe we can create a GH discussion pointing to some key PRs (template and call-FusionTranscript) for new developers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could; I'm not sure how big of a difference it would make since the caching is basically just saving some index keys but it's worth testing with a pipeline like call-gSNP for sure.
I'm going through some old PRs so we can make a thread with some of the key decisions/discussions we had regarding pipelines, we can review some at the NF WG next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Yash.
Yeah, it might be more dependent on the number of jobs. In either case, it's worth checking.