From bd542a77b728de70782636edc8dc3eb4b3728615 Mon Sep 17 00:00:00 2001
From: sreichl <reichl.stephan@gmail.com>
Date: Tue, 3 Dec 2024 18:49:53 +0100
Subject: [PATCH] doc recipe and end-to-end usage #34

---
 README.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 4c9a4c8..43b8cda 100644
--- a/README.md
+++ b/README.md
@@ -74,10 +74,7 @@ Three components are required to use a module within your Snakemake workflow (i.
 # 📜 Recipes
 > _"Civilization advances by extending the number of important operations which we can perform without thinking of them."_ - Alfred North Whitehead, author of _Principia Mathematica_
 
-**Recipes** are combinations of existing modules into end-to-end best practice analyses. They can be used as templates for standard analyses by leveraging existing modules, thereby enabling fast iterations and progression into the unknown. Every recipe is described and presented using a [wiki](https://github.com/epigen/MrBiomics/wiki) page by application to a public data set.
-
-> [!TIP]
-> Process each dataset module by module. Check the results of each module to inform the configuration of the next module. This iterative method allows for quick initial completion, followed by refinement in subsequent iterations based on feedback from yourself or collaborators. Adjustments in later iterations are straightforward, requiring only changes to individual configurations or annotations. Ultimately you end up with a reproducible and readable end-to-end analysis for each dataset.
+**Recipes** are combinations of existing modules into end-to-end best practice analyses. They can be used as templates for standard analyses by leveraging existing modules, thereby enabling fast iterations and progression into the unknown. This represents "functional knowledge management". Every recipe is described and presented using a [wiki](https://github.com/epigen/MrBiomics/wiki) page by application to a publicly available dataset.
 
 | Recipe | Description | # Modules | Results |
 | :---: | :---: | :---: | :---: |
@@ -87,6 +84,11 @@ Three components are required to use a module within your Snakemake workflow (i.
 | [scRNA-seq Analysis](../../wiki/scRNAseq-Analysis-Recipe) | From count matrix to enrichemnts of differentially expressed genes. | 5(-6) | ... |
 | [scCRISPR-seq Analysis](../../wiki/scCRISPRseq-Analysis-Recipe) | From count matrix to knockout phenotype enrichemnts. | 6(-7) | ... |
 
+**Usage:** Process each dataset module by module. Check the results of each module to inform the configuration of the next module. This iterative method allows for quick initial completion, followed by refinement in subsequent iterations based on feedback from yourself or collaborators. Adjustments in later iterations are straightforward, requiring only changes to individual configurations or annotations. Ultimately you end up with a reproducible and readable end-to-end analysis for each dataset.
+
+> [!IMPORTANT]
+> For end-to-end analysis it is required that all configuration and annotation files exist before (re)running the workflow. If a module requires an annotation file generated by a previous module and it does not exist (yet), DAG construction fails with a `MissingInputException`. This can happen in workflows where downstream module configuration relies on outputs from preceding modules. For example, a sample annotation file created by the `atacseq_pipeline` module is used to configure downstream the `spilterlize_integrate` module. The best practice is to run the workflow module by module and save the output required for configuration (e.g., sample annotation) externally. We recommend the workflow's `config/` folder of the respective dataset. Thereby making the workflow re-runnable for future end-to-end execution, for iterations with changed parameters and reproducibility. Checkpoints are no solution as they only apply to rules, not to modules.
+
 > [!NOTE]  
 > ⭐️ **Star this repository and share recipes you find valuable** 📤 — help others find them, and guide our future work!