diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/README.md index 9208b12..892bcb0 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/README.md @@ -1,73 +1,72 @@ -**WARNING** +# Flux Workflow Examples -This repository has been archived. It is no longer maintained and it is -likely the examples do not work or are no longer good or suggested -examples. - -Please look elswhere for examples. - -**Flux Workflow Examples** +This contents used to be hosted at [flux-framework/flux-workflow-examples](https://github.com/flux-framework/flux-workflow-examples) and has been moved here for annual updates paired with the Flux Tutorials. The examples contained here demonstrate and explain some simple use-cases with Flux, and make use of Flux's command-line interface (CLI), Flux's C library, and the Python and Lua bindings to the C library. -**Requirements** +## Requirements The examples assume that you have installed: 1. A recent version of Flux - 2. Python 3.6+ - 3. Lua 5.1+ -**_1. [CLI: Job Submission](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-submit-cli)_** +You can also use an interactive container locally, binding this directory to the container: + +```bash +docker run -it -v $(pwd):/home/fluxuser/flux-workflow-examples fluxrm/flux-sched:jammy +cd /home/fluxuser/flux-workflow-examples/ +``` + +**_1. [CLI: Job Submission](job-submit-cli)_** Launch a flux instance and schedule/launch compute and io-forwarding jobs on separate nodes using the CLI -**_2. [Python: Job Submission](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-submit-api)_** +**_2. [Python: Job Submission](job-submit-api)_** Schedule/launch compute and io-forwarding jobs on separate nodes using the Python bindings -**_3. [Python: Job Submit/Wait](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-submit-wait)_** +**_3. [Python: Job Submit/Wait](job-submit-wait)_** Submit jobs and wait for them to complete using the Flux Python bindings -**_4. [Python: Asynchronous Bulk Job Submission](https://github.com/flux-framework/flux-workflow-examples/tree/master/async-bulk-job-submit)_** +**_4. [Python: Asynchronous Bulk Job Submission](async-bulk-job-submit)_** Asynchronously submit jobspec files from a directory and wait for them to complete in any order -**_5. [Python: Tracking Job Status and Events](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-status-control)_** +**_5. [Python: Tracking Job Status and Events](job-status-control)_** Submit job bundles, get event updates, and wait until all jobs complete -**_6. [Python: Job Cancellation](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-cancel)_** +**_6. [Python: Job Cancellation](job-cancel)_** Cancel a running job -**_7. [Lua: Use Events](https://github.com/flux-framework/flux-workflow-examples/tree/master/synchronize-events)_** +**_7. [Lua: Use Events](synchronize-events)_** Use events to synchronize compute and io-forwarding jobs running on separate nodes -**_8. [Python: Simple KVS Example](https://github.com/flux-framework/flux-workflow-examples/tree/master/kvs-python-bindings)_** +**_8. [Python: Simple KVS Example](kvs-python-bindings)_** Use KVS Python interfaces to store user data into KVS -**_9. [CLI/Lua: Job Ensemble Submitted with a New Flux Instance](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-ensemble)_** +**_9. [CLI/Lua: Job Ensemble Submitted with a New Flux Instance](job-ensemble)_** Submit job bundles, print live job events, and exit when all jobs are complete -**_10. [CLI: Hierarchical Launching](https://github.com/flux-framework/flux-workflow-examples/tree/master/hierarchical-launching)_** +**_10. [CLI: Hierarchical Launching](hierarchical-launching)_** Launch a large number of sleep 0 jobs -**_11. [C/Lua: Use a Flux Comms Module](https://github.com/flux-framework/flux-workflow-examples/tree/master/comms-module)_** +**_11. [C/Lua: Use a Flux Comms Module](comms-module)_** Use a Flux Comms Module to communicate with job elements -**_12. [C/Python: A Data Conduit Strategy](https://github.com/flux-framework/flux-workflow-examples/tree/master/data-conduit)_** +**_12. [C/Python: A Data Conduit Strategy](data-conduit)_** Attach to a job that receives OS time data from compute jobs diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/async-bulk-job-submit/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/async-bulk-job-submit/README.md index 719af07..c612e51 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/async-bulk-job-submit/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/async-bulk-job-submit/README.md @@ -1,7 +1,6 @@ -## Python Asynchronous Bulk Job Submission +# Python Asynchronous Bulk Job Submission -Parts (a) and (b) demonstrate different implementations of the same basic use-case---submitting -large numbers of jobs to Flux. For simplicity, in these examples all of the jobs are identical. +Parts (a) and (b) demonstrate different implementations of the same basic use-case---submitting large numbers of jobs to Flux. For simplicity, in these examples all of the jobs are identical. In part (a), we use the `flux.job.submit_async` and `flux.job.wait` functions to submit jobs and wait for them. In part (b), we use the `FluxExecutor` class, which offers a higher-level interface. It is important to note that @@ -10,49 +9,59 @@ The executor's futures fulfill in the background and callbacks added to the futu be invoked by different threads; the `submit_async` futures do not fulfill in the background, callbacks are always invoked by the same thread that added them, and sharing the futures among threads is not supported. -### Setup - Downloading the Files +## Setup - Downloading the Files If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/async-bulk-job-submit ``` -### Part (a) - Using `submit_async` +## Part (a) - Using `submit_async` -#### Description: Asynchronously submit jobspec files from a directory and wait for them to complete in any order +### Description: Asynchronously submit jobspec files from a directory and wait for them to complete in any order 1. Allocate three nodes from a resource manager: -`salloc -N3 -ppdebug` +```bash +salloc -N3 -ppdebug +``` 2. Make a **jobs** directory: -`mkdir jobs` +```bash +mkdir /tmp/jobs +``` -3. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: +3. If you are running Slurm, launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 4. Store the jobspec of a `sleep 0` job in the **jobs** directory: -`flux mini run --dry-run -n1 sleep 0 > jobs/0.json` +```bash +flux run --dry-run -n1 sleep 0 > /tmp/jobs/0.json +``` 5. Copy the jobspec of **job0** 1024 times to create a directory of 1025 `sleep 0` jobs: -``for i in `seq 1 1024`; do cp jobs/0.json jobs/${i}.json; done`` +```bash +for i in `seq 1 1024`; do cp /tmp/jobs/0.json /tmp/jobs/${i}.json; done +``` 6. Run the **bulksubmit.py** script and pass all jobspec in the **jobs** directory as an argument with a shell glob `jobs/*.json`: -`./bulksubmit.py jobs/*.json` - +```bash +./bulksubmit.py /tmp/jobs/*.json ``` +```console bulksubmit: Starting... -bulksubmit: submitted 1025 jobs in 3.04s. 337.09job/s -bulksubmit: First job finished in about 3.089s -|██████████████████████████████████████████████████████████| 100.0% (29.4 job/s) -bulksubmit: Ran 1025 jobs in 34.9s. 29.4 job/s +bulksubmit: submitted 1025 jobs in 0.43s. 2392.93job/s +bulksubmit: First job finished in about 0.521s +|██████████████████████████████████████████████████████████| 100.0% (274.3 job/s) +bulksubmit: Ran 1025 jobs in 3.7s. 274.3 job/s ``` ### Notes to Part (a) @@ -65,7 +74,7 @@ bulksubmit: Ran 1025 jobs in 34.9s. 29.4 job/s ```python if h.reactor_run() < 0: - h.fatal_error("reactor start failed") + h.fatal_error("reactor start failed") ``` The reactor will return automatically when there are no more outstanding RPC responses, i.e., all jobs have been submitted. @@ -81,19 +90,24 @@ If continuing from part (a), skip to step 3. 1. Allocate three nodes from a resource manager: -`salloc -N3 -ppdebug` +```bash +salloc -N3 -ppdebug +``` 2. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 3. Run the **bulksubmit_executor.py** script and pass the command (`/bin/sleep 0` in this example) and the number of times to run it (default is 100): -`./bulksubmit_executor.py -n200 /bin/sleep 0` - +```bash +./bulksubmit_executor.py -n200 /bin/sleep 0 ``` -bulksubmit_executor: submitted 200 jobs in 0.45s. 441.15job/s -bulksubmit_executor: First job finished in about 1.035s -|██████████████████████████████████████████████████████████| 100.0% (24.9 job/s) -bulksubmit_executor: Ran 200 jobs in 8.2s. 24.4 job/s +```console +bulksubmit_executor: submitted 200 jobs in 0.18s. 1087.27job/s +bulksubmit_executor: First job finished in about 0.248s +|██████████████████████████████████████████████████████████| 100.0% (229.8 job/s) +bulksubmit_executor: Ran 200 jobs in 1.0s. 199.6 job/s ``` diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/comms-module/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/comms-module/README.md index 3acdc5c..6f1456c 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/comms-module/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/comms-module/README.md @@ -1,36 +1,93 @@ -### Using a Flux Comms Module +# Using a Flux Comms Module -#### Description: Use a Flux comms module to communicate with job elements +## Description: Use a Flux comms module to communicate with job elements -##### Setup +### Setup If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/comms-module ``` -##### Execution +### Execution + +If you need to get an allocation on Slurm: -1. `salloc -N3 -ppdebug` +```bash +salloc -N3 -ppdebug +``` -2. Point to `flux-core`'s `pkgconfig` directory: +Point to `flux-core`'s `pkgconfig` directory: | Shell | Command | | ----- | ---------- | | tcsh | `setenv PKG_CONFIG_PATH <FLUX_INSTALL_PATH>/lib/pkgconfig` | | bash/zsh | `export PKG_CONFIG_PATH='<FLUX_INSTALL_PATH>/lib/pkgconfig'` | -3. `make` +This might look like this in the container: + +```bash +export PKG_CONFIG_PATH=/usr/lib/pkgconfig +``` -4. Add the directory of the modules to `FLUX_MODULE_PATH`; if the module was +Then build the module (if you don't have permission, copy to /tmp) + +```bash +cp -R ./comms-module /tmp/comms-module +cd /tmp/comms-module +make +``` + +Add the directory of the modules to `FLUX_MODULE_PATH`; if the module was built in the current dir: -`export FLUX_MODULE_PATH=${FLUX_MODULE_PATH}:$(pwd)` +```bash +flux module load ioapp.so +flux module load capp.so +export FLUX_MODULE_PATH=${FLUX_MODULE_PATH}:$(pwd) +``` -5. `srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +Now let's try it! If you need to run flux start under Slurm: -6. `flux submit -N 2 -n 2 ./compute.lua 120` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` + +Try running flux with the module on the path. + +```bash +flux run -N 1 -n 2 ./compute.lua 120 +flux run -N 1 -n 2 ./io-forwarding.lua 120 +``` +Notice that the module is loaded (at the bottom): -7. `flux submit -N 1 -n 1 ./io-forwarding.lua 120` +```console +Try `flux-module load --help' for more information. +Module Idle S Sendq Recvq Service +heartbeat 1 R 0 0 +resource 0 R 0 0 +job-ingest 0 R 0 0 +kvs-watch 0 R 0 0 +sched-fluxion-resource 0 R 0 0 +cron idle R 0 0 +barrier idle R 0 0 +job-exec 0 R 0 0 +job-list idle R 0 0 +kvs 0 R 0 0 +content-sqlite 0 R 0 0 content-backing +job-info 0 R 0 0 +job-manager 0 R 0 0 +sched-fluxion-qmanager 0 R 0 0 sched +content 0 R 0 0 +connector-local 0 R 0 0 1002-shell-f3Lv2Zd3tj,1002-shell-f3N2WmZB5H +ioapp 83 R 0 0 +Block until we hear go message from the an io forwarder +``` + +If you run them together, they work together: + +```bash +flux submit -N 1 -n 2 ./compute.lua 120 +flux run -N 1 -n 2 ./io-forwarding.lua 120 +``` diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/README.md index a68aedb..3a9a927 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/README.md @@ -1,29 +1,47 @@ -## A Data Conduit Strategy +# A Data Conduit Strategy -### Description: Use a data stream to send packets through +**Note that this module script does not compile and needs an update** -#### Setup + +## Description: Use a data stream to send packets through + +### Setup If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/data-conduit ``` -#### Execution +### Execution -1. Allocate three nodes from a resource manager: +If you are using Slurm, allocate three nodes from a resource manager: -`salloc -N3 -ppdebug` +```bash +salloc -N3 -ppdebug +``` -2. Point to `flux-core`'s `pkgconfig` directory: +Point to `flux-core`'s `pkgconfig` directory: | Shell | Command | | ----- | ---------- | | tcsh | `setenv PKG_CONFIG_PATH <FLUX_INSTALL_PATH>/lib/pkgconfig` | | bash/zsh | `export PKG_CONFIG_PATH='<FLUX_INSTALL_PATH>/lib/pkgconfig'` | +This might look like this in the container: + +```bash +export PKG_CONFIG_PATH=/usr/lib/pkgconfig +``` + +Then build the module (if you don't have permission, copy to /tmp) + +```bash +cp -R ./data-conduit /tmp/data-conduit +cd /tmp/data-conduit +make +``` + 3. `make` 4. Add the directory of the modules to `FLUX_MODULE_PATH`, if the module was built in the current directory: diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/conduit.c b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/conduit.c index 790f84f..9e6f446 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/conduit.c +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/data-conduit/conduit.c @@ -45,7 +45,7 @@ static struct conduit_ctx *getctx (flux_t *h) return ctx; } -/* Foward the received JSON string to the datastore.py */ +/* Forward the received JSON string to the datastore.py */ static int conduit_send (flux_t *h, const char *json_str) { int rc = -1; diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/README.md index 7f9c7ca..ff33747 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/README.md @@ -1,25 +1,30 @@ -## Hierarchical Launching +# Hierarchical Launching -### Description: Launch an ensemble of sleep 0 tasks +## Description: Launch an ensemble of sleep 0 tasks -#### Setup +### Setup If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/hierarchical-launching ``` -#### Execution +### Execution -1. `salloc -N3 -ppdebug` +If you need to start flux on a Slurm cluster: -2. `srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +salloc -N3 -ppdebug +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` -3. `./parent.sh` +Start the parent instance +```bash +./parent.sh ``` +```console Mon Nov 18 15:31:08 PST 2019 13363018989568 13365166473216 @@ -28,7 +33,6 @@ First Level Done Mon Nov 18 15:34:13 PST 2019 ``` - ### Notes - You can increase the number of jobs by increasing `NCORES` in `parent.sh` and diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/ensemble.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/ensemble.sh index 0edca81..efd987b 100755 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/ensemble.sh +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/ensemble.sh @@ -4,7 +4,7 @@ NJOBS=750 MAXTIME=$(expr ${NJOBS} + 2) for i in `seq 1 ${NJOBS}`; do - flux mini submit --nodes=1 --ntasks=1 --cores-per-task=1 sleep 0 + flux submit --nodes=1 --ntasks=1 --cores-per-task=1 sleep 0 done flux jobs diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/parent.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/parent.sh index 84ef464..19d74e3 100755 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/parent.sh +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/hierarchical-launching/parent.sh @@ -5,7 +5,7 @@ NCORES=3 date for i in `seq 1 ${NCORES}`; do - flux mini submit -N 1 -n 1 flux start ./ensemble.sh + flux submit -N 1 -n 1 flux start ./ensemble.sh done flux queue drain diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-cancel/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-cancel/README.md index af1d3b8..2c9c0cf 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-cancel/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-cancel/README.md @@ -1,25 +1,24 @@ -## Job Cancellation +# Job Cancellation -### Description: Cancel a running job +## Description: Cancel a running job -#### Setup +### Setup If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/job-cancel ``` -#### Execution - -1. Launch the submitter script: +### Execution -`./submitter.py $(flux resource list -no {ncores} --state=up)` - -_note: for older versions of Flux, you might need to instead run: `./submitter.py $(flux hwloc info | awk '{print $3}')`_ +Launch the submitter script: +```bash +python3 ./submitter.py $(flux resource list -no {ncores} --state=up) ``` + +```console Submitted 1st job: 2241905819648 Submitted 2nd job: 2258951471104 diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/README.md index 3f303ea..99361f5 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/README.md @@ -1,8 +1,8 @@ -### Job Ensemble Submitted with a New Flux Instance +# Job Ensemble Submitted with a New Flux Instance -#### Description: Launch a flux instance and submit one instance of an io-forwarding job and 50 compute jobs, each spanning the entire set of nodes. +## Description: Launch a flux instance and submit one instance of an io-forwarding job and 50 compute jobs, each spanning the entire set of nodes. -#### Setup +### Setup If you haven't already, download the files and change your working directory: @@ -11,47 +11,37 @@ $ git clone https://github.com/flux-framework/flux-workflow-examples.git $ cd flux-workflow-examples/job-ensemble ``` -#### Execution +### Execution -1. `salloc -N3 -ppdebug` +If you need a Slurm allocation: -2. `cat ensemble.sh` +```bash +salloc -N3 -ppdebug +# Take a look at the script first +cat ensemble.sh ``` -#!/usr/bin/env sh +Here is how to run under Slurm: -NJOBS=10 -MAXTIME=$(expr ${NJOBS} + 2) -JOBIDS="" - -JOBIDS=$(flux mini submit --nodes=1 --ntasks=1 --cores-per-task=2 ./io-forwarding.lua ${MAXTIME}) -for i in `seq 1 ${NJOBS}`; do - JOBIDS="${JOBIDS} $(flux mini submit --nodes=2 --ntasks=4 --cores-per-task=2 ./compute.lua 1)" -done +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out ./ensemble.sh +``` -flux jobs -flux queue drain +Or without: -# print mock-up prevenance data -for i in ${JOBIDS}; do - echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" - echo "Jobid: ${i}" - KVSJOBID=$(flux job id --from=dec --to=kvs ${i}) - flux kvs get ${KVSJOBID}.R | jq -done +```bash +flux start -o,-S,log-filename=out ./ensemble.sh ``` -3. `srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out ./ensemble.sh` - ``` -JOBID USER NAME STATE NTASKS NNODES RUNTIME RANKS -1721426247680 fluxuser compute.lu RUN 4 2 0.122s [1-2] -1718322462720 fluxuser compute.lu RUN 4 2 0.293s [0,2] -1715201900544 fluxuser compute.lu RUN 4 2 0.481s [0-1] -1712299442176 fluxuser compute.lu RUN 4 2 0.626s [1-2] -1709296320512 fluxuser compute.lu RUN 4 2 0.885s [0,2] -1706293198848 fluxuser compute.lu RUN 4 2 1.064s [0-1] -1691378253824 fluxuser io-forward RUN 1 1 1.951s 0 +JOBID USER NAME STATE NTASKS NNODES RUNTIME +1721426247680 fluxuser compute.lu RUN 4 2 0.122s +1718322462720 fluxuser compute.lu RUN 4 2 0.293s +1715201900544 fluxuser compute.lu RUN 4 2 0.481s +1712299442176 fluxuser compute.lu RUN 4 2 0.626s +1709296320512 fluxuser compute.lu RUN 4 2 0.885s +1706293198848 fluxuser compute.lu RUN 4 2 1.064s +1691378253824 fluxuser io-forward RUN 1 1 1.951s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jobid: 1691378253824 { diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/ensemble.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/ensemble.sh index 8c76593..9468ec6 100755 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/ensemble.sh +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-ensemble/ensemble.sh @@ -4,9 +4,9 @@ NJOBS=10 MAXTIME=$(expr ${NJOBS} + 2) JOBIDS="" -JOBIDS=$(flux mini submit --nodes=1 --ntasks=1 --cores-per-task=2 ./io-forwarding.lua ${MAXTIME}) +JOBIDS=$(flux submit --nodes=1 --ntasks=1 --cores-per-task=2 ./io-forwarding.lua ${MAXTIME}) for i in `seq 1 ${NJOBS}`; do - JOBIDS="${JOBIDS} $(flux mini submit --nodes=2 --ntasks=4 --cores-per-task=2 ./compute.lua 1)" + JOBIDS="${JOBIDS} $(flux submit --nodes=1 --ntasks=1 --cores-per-task=2 ./compute.lua 1)" done flux jobs @@ -16,6 +16,6 @@ flux queue drain for i in ${JOBIDS}; do echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" echo "Jobid: ${i}" - KVSJOBID=$(flux job id --from=dec --to=kvs ${i}) + KVSJOBID=$(flux job id --to=kvs ${i}) flux kvs get ${KVSJOBID}.R | jq done diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-status-control/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-status-control/README.md index bbd3704..02ad14c 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-status-control/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-status-control/README.md @@ -1,31 +1,35 @@ -## Using Flux Job Status and Control API +# Using Flux Job Status and Control API -### Description: Submit job bundles, get event updates, and wait until all jobs complete +## Description: Submit job bundles, get event updates, and wait until all jobs complete -#### Setup +### Setup If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/job-status-control ``` -#### Execution +### Execution 1. Allocate three nodes from a resource manager: -`salloc -N3 -p pdebug` +```bash +salloc -N3 -p pdebug +``` -2. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: +2. If needed, launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 3. Run the bookkeeper executable along with the number of jobs to be submitted (if no size is specified, 6 jobs are submitted: 3 instances of **compute.py**, and 3 instances of **io-forwarding,py**): -`./bookkeeper.py 2` - +```bash +python3 ./bookkeeper.py 2 ``` +```console bookkeeper: all jobs submitted bookkeeper: waiting until all jobs complete job 39040581632 triggered event 'submit' @@ -49,8 +53,6 @@ job 39040581632 triggered event 'clean' bookkeeper: all jobs completed ``` ---- - ### Notes - The following constructs a job request using the **JobspecV1** class with customizable parameters for how you want to utilize the resources allocated for your job: diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-api/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-api/README.md index 12cf931..cfcd17e 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-api/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-api/README.md @@ -1,99 +1,108 @@ -## Job Submit API +# Job Submit API To run the following examples, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/job-submit-api ``` -### Part(a) - Using a direct job.submit RPC +## Part(a) - Using a direct job.submit RPC -#### Description: Schedule and launch compute and io-forwarding jobs on separate nodes +### Description: Schedule and launch compute and io-forwarding jobs on separate nodes 1. Allocate three nodes from a resource manager: -`salloc -N3 -p pdebug` +```bash +salloc -N3 -p pdebug +``` 2. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 3. Run the submitter executable: -`./submitter.py` +```bash +python3 ./submitter.py +``` 4. List currently running jobs: -`flux jobs` - +```bash +flux jobs ``` -JOBID USER NAME ST NTASKS NNODES RUNTIME RANKS -ƒ5W8gVwm moussa1 io-forward R 1 1 19.15s 2 -ƒ5Vd2kJs moussa1 compute.py R 4 2 19.18s [0-1] +```console +JOBID USER NAME ST NTASKS NNODES RUNTIME +ƒ5W8gVwm fluxuser io-forward R 1 1 19.15s +ƒ5Vd2kJs fluxuser compute.py R 4 2 19.18s ``` 5. Information about jobs, such as the submitted job specification, an eventlog, and the resource description format **R** are stored in the KVS. The data can be queried via the `job-info` module via the `flux job info` command. For example, to fetch **R** for a job which has been allocated resources: -`flux job info ƒ5W8gVwm R` - +```bash +flux job info ƒ5W8gVwm R ``` +```console {"version":1,"execution":{"R_lite":[{"rank":"2","children":{"core":"0"}}]}} ``` - -`flux job info ƒ5Vd2kJs R` - +```bash +flux job info ƒ5Vd2kJs R ``` +```console {"version":1,"execution":{"R_lite":[{"rank":"0-1","children":{"core":"0-3"}}]}} ``` -### Part(b) - Using a direct job.submit RPC +## Part(b) - Using a direct job.submit RPC -#### Description: Schedule and launch both compute and io-forwarding jobs across all nodes +### Schedule and launch both compute and io-forwarding jobs across all nodes 1. Allocate three nodes from a resource manager: -`salloc -N3 -p pdebug` +```bash +salloc -N3 -p pdebug +``` 2. Launch another Flux instance on the current allocation: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 3. Run the second submitter executable: -`./submitter2.py` +```bash +python3 ./submitter2.py +``` 4. List currently running jobs: -`flux jobs` - +```bash +flux jobs ``` -JOBID USER NAME ST NTASKS NNODES RUNTIME RANKS -ƒctYadhh moussa1 io-forward R 3 3 3.058s [0-2] -ƒct1StnT moussa1 compute.py R 6 3 3.086s [0-2] +```console +JOBID USER NAME ST NTASKS NNODES RUNTIME +ƒctYadhh fluxuser io-forward R 3 3 3.058s +ƒct1StnT fluxuser compute.py R 6 3 3.086s ``` 5. Fetch **R** for the jobs that have been allocated resources: -`flux job info ƒctYadhh R` - +```bash +flux job info $(flux job last) jobspec ``` +```console {"version":1,"execution":{"R_lite":[{"rank":"0-2","children":{"core":"0-3"}}]}} ``` - -`flux job info ƒct1StnT R` - -``` -{"version":1,"execution":{"R_lite":[{"rank":"0-2","children":{"core":"0-3"}}]}} +```console +{"resources": [{"type": "node", "count": 3, "with": [{"type": "slot", "count": 1, "with": [{"type": "core", "count": 1}], "label": "task"}]}], "tasks": [{"command": ["./io-forwarding.py", "120"], "slot": "task", "count": {"per_slot": 1}}], "attributes": {"system": {"duration": 0, "cwd": "/home/fluxuser/flux-workflow-examples/job-submit-api"}}, "version": 1} ``` ---- - ### Notes - `f = flux.Flux()` creates a new Flux handle which can be used to connect to and interact with a Flux instance. - - The following constructs a job request using the **JobspecV1** class with customizable parameters for how you want to utilize the resources allocated for your job: ```python compute_jobreq = JobspecV1.from_command( diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-cli/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-cli/README.md index b50e71a..0a34979 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-cli/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-cli/README.md @@ -1,81 +1,61 @@ -## Job Submit CLI +# Job Submit CLI To run the following examples, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```console $ cd flux-workflow-examples/job-submit-cli ``` -### Part(a) - Partitioning Schedule - -#### Description: Launch a flux instance and schedule/launch compute and io-forwarding jobs on separate nodes - -1. `salloc -N3 -ppdebug` - -2. `srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` - -3. `flux mini submit --nodes=2 --ntasks=4 --cores-per-task=2 ./compute.lua 120` +## Example -4. `flux mini submit --nodes=1 --ntasks=1 --cores-per-task=2 ./io-forwarding.lua 120` +### Launch a flux instance and submit compute and io-forwarding jobs -5. List running jobs: +If you need an allocation: -`flux jobs` - -``` -JOBID USER NAME ST NTASKS NNODES RUNTIME RANKS -ƒ3ETxsR9H moussa1 io-forward R 1 1 2.858s 2 -ƒ38rBqEWT moussa1 compute.lu R 4 2 15.6s [0-1] +```bash +salloc -N3 -ppdebug +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out ``` -6. Get information about job: +To submit -`flux job info ƒ3ETxsR9H R` +```bash +# if you have more than one node... +flux submit --nodes=2 --ntasks=4 --cores-per-task=2 ./compute.lua 120 -``` -{"version":1,"execution":{"R_lite":[{"rank":"2","children":{"core":"0-1"}}]}} +# and if not! +flux submit --nodes=1 --ntasks=1 --cores-per-task=2 ./io-forwarding.lua 120 ``` -`flux job info ƒ38rBqEWT R` +Attach to watch output: +```bash +# Control +C then Control+Z to detach +flux job attach $(flux job last) ``` -{"version":1,"execution":{"R_lite":[{"rank":"0-1","children":{"core":"0-3"}}]}} -``` - -### Part(b) - Overlapping Schedule - -#### Description: Launch a flux instance and schedule/launch both compute and io-forwarding jobs across all nodes -1. `salloc -N3 -ppdebug` - -2. `srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` - -3. `flux mini submit --nodes=3 --ntasks=6 --cores-per-task=2 ./compute.lua 120` - -4. `flux mini submit --nodes=3 --ntasks=3 --cores-per-task=1 ./io-forwarding.lua 120` - -5. List jobs in KVS: - -`flux jobs` +List running jobs: +```bash +flux jobs ``` -JOBID USER NAME ST NTASKS NNODES RUNTIME RANKS -ƒ3ghmgCpw moussa1 io-forward R 3 3 16.91s [0-2] -ƒ3dSybfQ3 moussa1 compute.lu R 6 3 24.3s [0-2] - +```console +JOBID USER NAME ST NTASKS NNODES RUNTIME +ƒ3ETxsR9H fluxuser io-forward R 1 1 2.858s +ƒ38rBqEWT fluxuser compute.lu R 4 2 15.6s ``` -6. Get information about job: - -`flux job info ƒ3ghmgCpw R` - -``` -{"version":1,"execution":{"R_lite":[{"rank":"0-2","children":{"core":"4"}}]}} -``` +Get information about job: -`flux job info ƒ3dSybfQ3 R` +```bash +flux job info $(flux job last) R +flux job info $(flux job last) jobspec +flux job info $(flux job last) eventlog +flux job info $(flux job last) guest.output +# Example with flux job id +flux job info ƒ3ETxsR9H R ``` -{"version":1,"execution":{"R_lite":[{"rank":"0-2","children":{"core":"0-3"}}]}} +```console +{"version": 1, "execution": {"R_lite": [{"rank": "0", "children": {"core": "5-7"}}], "nodelist": ["674f16a501e5"], "starttime": 1723225494, "expiration": 4876808372}} ``` diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-wait/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-wait/README.md index 1f8745e..c4fbd5d 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-wait/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/job-submit-wait/README.md @@ -1,29 +1,33 @@ -## Python Job Submit/Wait +# Python Job Submit/Wait To run the following examples, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```bash $ cd flux-workflow-examples/job-submit-wait ``` -### Part(a) - Python Job Submit/Wait +## Part(a) - Python Job Submit/Wait -#### Description: Submit jobs asynchronously and wait for them to complete in any order +### Description: Submit jobs asynchronously and wait for them to complete in any order -1. Allocate three nodes from a resource manager: +1. If needed, allocate three nodes from a resource manager: -`salloc -N3 -ppdebug` +```bash +salloc -N3 -ppdebug +``` 2. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 3. Submit the **submitter_wait_any.py** script, along with the number of jobs you want to run (if no argument is passed, 10 jobs are submitted): -`./submitter_wait_any.py 10` - +```bash +python3 ./submitter_wait_any.py 10 ``` +```console submit: 46912591450240 compute_jobspec submit: 46912591450912 compute_jobspec submit: 46912591451080 compute_jobspec @@ -46,25 +50,28 @@ wait: 46912591451080 Success wait: 46912591362984 Success ``` ---- - -### Part(b) - Python Job Submit/Wait (Sliding Window) +## Part(b) - Python Job Submit/Wait (Sliding Window) -#### Description: Asynchronously submit jobs and keep at most a number of those jobs active +### Description: Asynchronously submit jobs and keep at most a number of those jobs active 1. Allocate three nodes from a resource manager: -`salloc -N3 -ppdebug` +```bash +salloc -N3 -ppdebug +``` 2. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 3. Submit the **submitter_sliding_window.py** script, along with the number of jobs you want to run and the size of the window (if no argument is passed, 10 jobs are submitted and the window size is 2 jobs): -`./submitter_sliding_window.py 10 3` - +```bash +python3 ./submitter_sliding_window.py 10 3 ``` +```console submit: 5624175788032 submit: 5624611995648 submit: 5625014648832 @@ -87,25 +94,29 @@ wait: 5986882420736 Success wait: 6164435697664 Success ``` ---- -### Part(c) - Python Job Submit/Wait (Specific Job ID) +## Part(c) - Python Job Submit/Wait (Specific Job ID) -#### Description: Asynchronously submit jobs, block/wait for specific jobs to complete +### Description: Asynchronously submit jobs, block/wait for specific jobs to complete 1. Allocate three nodes from a resource manager: -`salloc -N3 -ppdebug` +```bash +salloc -N3 -ppdebug +``` 2. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory: -`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +```bash +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +``` 3. Submit the **submitter_wait_in_order.py** script, along with the number of jobs you want to run (if no argument is passed, 10 jobs are submitted): -`./submitter_wait_in_order.py 10` - +```bash +python3 ./submitter_wait_in_order.py 10 ``` +```console submit: 46912593818008 compute_jobspec submit: 46912593818176 compute_jobspec submit: 46912593818344 compute_jobspec @@ -128,8 +139,6 @@ wait: 46912593819128 Error: job returned exit code 1 wait: 46912593819296 Error: job returned exit code 1 ``` ---- - ### Notes - The following constructs a job request using the **JobspecV1** class with customizable parameters for how you want to utilize the resources allocated for your job: diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/kvs-python-bindings/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/kvs-python-bindings/README.md index 0a67026..5c3aa22 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/kvs-python-bindings/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/kvs-python-bindings/README.md @@ -1,56 +1,62 @@ -## KVS Python Binding Example +# KVS Python Binding Example -### Description: Use the KVS Python interface to store user data into KVS +## Description: Use the KVS Python interface to store user data into KVS If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```console $ cd flux-workflow-examples/kvs-python-bindings ``` 1. Launch a Flux instance by running `flux start`, redirecting log messages to the file `out` in the current directory: -`flux start -s 1 -o,-S,log-filename=out` +```bash +flux start -s 1 -o,-S,log-filename=out +``` 2. Submit the Python script: -`flux mini submit -N 1 -n 1 ./kvsput-usrdata.py` - +```bash +flux submit -N 1 -n 1 ./kvsput-usrdata.py ``` +```console 6705031151616 ``` 3. Attach to the job and view output: -`flux job attach 6705031151616` - +```bash +flux job attach $(flux job last) ``` +```console hello world hello world again ``` 4. Each job is run within a KVS namespace. `FLUX_KVS_NAMESPACE` is set, which is automatically read and used by the KVS operations in the handle. To take a look at the job's KVS, convert its job ID to KVS: -`flux job id --from=dec --to=kvs 6705031151616` - +```bash +flux job id --to=kvs $(flux job last) ``` +```console job.0000.0619.2300.0000 ``` 5. The keys for this job will be put at the root of the namespace, which is mounted under "guest". To get the value stored under the first key "usrdata": -`flux kvs get job.0000.0619.2300.0000.guest.usrdata` - +```bash +flux kvs get job.0000.0619.2300.0000.guest.usrdata ``` +```bash "hello world" ``` 6. Get the value stored under the second key "usrdata2": -`flux kvs get job.0000.0619.2300.0000.guest.usrdata2` - +```bash +flux kvs get job.0000.0619.2300.0000.guest.usrdata2 ``` +```console "hello world again" ``` diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/synchronize-events/README.md b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/synchronize-events/README.md index 3fa4b53..641be09 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/synchronize-events/README.md +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/synchronize-events/README.md @@ -1,41 +1,44 @@ -### Using Events with Separate Nodes +# Using Events with Separate Nodes -#### Description: Using events to synchronize compute and io-forwarding jobs running on separate nodes +## Description: Using events to synchronize compute and io-forwarding jobs running on separate nodes If you haven't already, download the files and change your working directory: -``` -$ git clone https://github.com/flux-framework/flux-workflow-examples.git +```console $ cd flux-workflow-examples/synchronize-events ``` -1. `salloc -N3 -ppdebug` - -2. `srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out` +Ask for a Slurm allocation, if relevant: -3. `flux mini submit --nodes=2 --ntasks=4 --cores-per-task=2 ./compute.lua 120` - -**Output -** `225284456448` +```bash +salloc -N3 -ppdebug +srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out +flux submit --nodes=1 --ntasks=4 --cores-per-task=2 ./compute.lua 120 +``` -4. `flux mini submit --nodes=1 --ntasks=1 --cores-per-task=2 ./io-forwarding.lua 120` +And: -**Output -** `344889229312` +```bash +flux submit --nodes=1 --ntasks=1 --cores-per-task=2 ./io-forwarding.lua 120 +``` 5. List running jobs: -`flux jobs` - +```bash +flux jobs +``` ``` JOBID USER NAME ST NTASKS NNODES RUNTIME RANKS -ƒA4TgT7d moussa1 io-forward R 1 1 4.376s 2 -ƒ6vEcj7M moussa1 compute.lu R 4 2 11.51s [0-1] +ƒA4TgT7d fluxuser io-forward R 1 1 4.376s 2 +ƒ6vEcj7M fluxuser compute.lu R 4 2 11.51s [0-1] ``` 6. Attach to running or completed job output: -`flux job attach ƒ6vEcj7M` - +```bash +flux job attach ƒ6vEcj7M ``` +```console Block until we hear go message from the an io forwarder Block until we hear go message from the an io forwarder Recv an event: please proceed