Skip to content

Commit

Permalink
Linting.
Browse files Browse the repository at this point in the history
  • Loading branch information
tkphd committed May 2, 2024
1 parent 85ff721 commit 6b1c822
Show file tree
Hide file tree
Showing 6 changed files with 130 additions and 51 deletions.
25 changes: 18 additions & 7 deletions episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,16 @@ exercises: 30
---

::: questions

- "How do I run a simple command with Snakemake?"

:::

:::objectives

- "Create a Snakemake recipe (a Snakefile)"
:::

:::

## What is the workflow I'm interested in?

Expand All @@ -37,6 +40,7 @@ which prints out the name of the host where the command is executed:
```bash
[ocaisa@node1 ~]$ hostname
```

```output
node1.int.jetstream2.hpc-carpentry.org
```
Expand Down Expand Up @@ -74,7 +78,7 @@ rule hostname_login:
1. We named the rule `hostname_login`. You may use letters, numbers or
underscores, but the rule name must begin with a letter and may not be a
keyword.
1. The keywords `input`, `output`, `shell` are all followed by a colon.
1. The keywords `input`, `output`, and `shell` are all followed by a colon (":").
1. The file names and the shell command are all in `"quotes"`.
1. The output filename is given before the input filename. In fact, Snakemake
doesn't care what order they appear in but we give the output first
Expand All @@ -85,10 +89,10 @@ rule hostname_login:
:::

Back in the shell we'll run our new rule. At this point, if there were any
missing quotes, bad indents, etc. we may see an error.
missing quotes, bad indents, etc., we may see an error.

```bash
$ snakemake -j1 -p hostname_login
snakemake -j1 -p hostname_login
```

::: callout
Expand All @@ -98,6 +102,7 @@ $ snakemake -j1 -p hostname_login
If your shell tells you that it cannot find the command `snakemake` then we need
to make the software available somehow. In our case, this means searching for
the module that we need to load:

```bash
module spider snakemake
```
Expand All @@ -122,7 +127,6 @@ Names marked by a trailing (E) are extensions provided by another module.
$ module spider snakemake/8.2.1
--------------------------------------------------------------------------------------------------------
```

Now we want the module, so let's load that to make the package available
Expand All @@ -136,9 +140,14 @@ and then make sure we have the `snakemake` command available
```bash
[ocaisa@node1 ~]$ which snakemake
```

```output
/cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/snakemake/8.2.1-foss-2023a/bin/snakemake
```

```bash
snakemake -j1 -p hostname_login
```
:::

::: challenge
Expand All @@ -152,8 +161,10 @@ What does the `-p` option in the `snakemake` command above do?
1. Tells Snakemake to only run one process at a time
1. Prompts the user for the correct input file

*Hint: you can search in the text by pressing `/`, and quit back to the shell
with `q`*
:::::: hint
You can search in the text by pressing <kbd>/</kbd>,
and quit back to the shell with <kbd>q</kbd>.
::::::

:::::: solution
(2) Prints the shell commands that are being run to the terminal
Expand Down
31 changes: 23 additions & 8 deletions episodes/02-snakemake_on_the_cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ the 'last modification time' of both the target and its dependencies. If any
dependency has been updated since the target, then the actions are re-run to
update the target. Using this approach, Snakemake knows to only rebuild the
files that, either directly or indirectly, depend on the file that changed. This
is called an _incremental build_.
is called an _incremental build_.

::: callout
## Incremental Builds Improve Efficiency
Expand All @@ -48,7 +48,6 @@ By only rebuilding files when required, Snakemake makes your processing
more efficient.
:::


::: challenge
## Running on the cluster

Expand All @@ -57,7 +56,7 @@ a new rule in your Snakefile and try to execute it on cluster with the option
`--executor slurm` to `snakemake`.

:::::: solution
The rule is almost identical to the previous rule save for the rule name and
The rule is almost identical to the previous rule save for the rule name and
output file:

```python
Expand All @@ -66,12 +65,14 @@ rule hostname_remote:
input:
shell:
"hostname > hostname_remote.txt"

```

You can then execute the rule with

```bash
[ocaisa@node1 ~]$ snakemake -j1 -p --executor slurm hostname_remote
```

```output
Building DAG of jobs...
Retrieving input from storage.
Expand All @@ -96,24 +97,31 @@ rule hostname_remote:
hostname > hostname_remote.txt
No SLURM account given, trying to guess.
Guessed SLURM account: def-users
No wall time information given. This might or might not work on your cluster. If not, specify the resource runtime in your rule or as a reasonable default via --default-resources.
No job memory information ('mem_mb' or 'mem_mb_per_cpu') is given - submitting without. This might or might not work on your cluster.
No wall time information given. This might or might not work on your cluster.
If not, specify the resource runtime in your rule or as a reasonable default
via --default-resources. No job memory information ('mem_mb' or
'mem_mb_per_cpu') is given - submitting without.
This might or might not work on your cluster.
Job 0 has been submitted with SLURM jobid 326 (log: /home/ocaisa/.snakemake/slurm_logs/rule_hostname_remote/326.log).
[Mon Jan 29 18:04:26 2024]
Finished job 0.
1 of 1 steps (100%) done
Complete log: .snakemake/log/2024-01-29T180346.788174.snakemake.log
```

Note all the warnings that Snakemake is giving us about the fact that the rule
may not be able to execute on our cluster as we may not have given enough
information. Luckily for us, this actually works on our cluster and we can take
a look in the output file the new rule creates, `hostname_remote.txt`:

```bash
[ocaisa@node1 ~]$ cat hostname_remote.txt
```

```output
tmpnode1.int.jetstream2.hpc-carpentry.org
```

::::::

:::
Expand All @@ -123,9 +131,11 @@ tmpnode1.int.jetstream2.hpc-carpentry.org
Adapting Snakemake to a particular environment can entail many flags and
options. Therefore, it is possible to specify a configuration profile to be used
to obtain default options. This looks like

```bash
snakemake --profile myprofileFolder ...
```

The profile folder must contain a file called `config.yaml` which is what will
store our options. The folder may also contain other files necessary for the
profile. Let's create the file `cluster_profile/config.yaml` and insert some of
Expand All @@ -141,8 +151,9 @@ We should now be able rerun our workflow by pointing to the profile rather than
the listing out the options. To force our workflow to rerun, we first need to
remove the output file `hostname_remote.txt`, and then we can try out our new
profile

```bash
[ocaisa@node1 ~]$ rm hostname_remote.txt
[ocaisa@node1 ~]$ rm hostname_remote.txt
[ocaisa@node1 ~]$ snakemake --profile cluster_profile hostname_remote
```

Expand All @@ -168,6 +179,7 @@ The warnings given by Snakemake hinted that we may need to provide these
options. One way to do it is to provide them is as part of the Snakemake rule
using the keyword `resources`,
e.g.,

```python
rule:
input: ...
Expand All @@ -176,10 +188,12 @@ rule:
partition: <partition name>
runtime: <some number>
```

and we can also use the profile to define default values for these options to
use with our project, using the keyword `default-resources`. For example, the
available memory on our cluster is about 4GB per core, so we can add that to our
profile:

```yaml
printshellcmds: True
jobs: 3
Expand All @@ -202,6 +216,7 @@ default-resources:
- mem_mb_per_cpu=3600
- runtime=2
```

::::::

:::
Expand All @@ -222,7 +237,7 @@ rule myrule:

Our initial rule was to
get the hostname of the login node. We always want to run that rule on the login
node for that to make sense. If we tell Snakemake to run all rules via the
node for that to make sense. If we tell Snakemake to run all rules via the
Slurm executor (which is what we are doing via our new profile) this
won't happen any more. So how do we force the rule to run on
the login node?
Expand Down
11 changes: 8 additions & 3 deletions episodes/03-placeholders.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,15 @@ exercises: 30
---

::: questions

- "How do I make a generic rule?"

:::

::: objectives

- "See how Snakemake deals with some errors"

:::

Our Snakefile has some duplication. For example, the names of text
Expand Down Expand Up @@ -61,7 +65,6 @@ The new rule has replaced explicit file names with things in `{curly brackets}`,
specifically `{output}` (but it could also have been `{input}`...if that had
a value and were useful).


### `{input}` and `{output}` are **placeholders**

Placeholders are used in the `shell` section of a rule, and Snakemake will
Expand All @@ -73,7 +76,9 @@ file, and
`{resources}` with the notation `{resources.runtime}` (for example).

:::keypoints

- "Snakemake rules are made more generic with placeholders"
- "Placeholders in the shell part of the rule are replaced with values based on the chosen
wildcards"
- "Placeholders in the shell part of the rule are replaced with values based on
the chosen wildcards"

:::
Loading

0 comments on commit 6b1c822

Please sign in to comment.