Skip to content

Commit

Permalink
flux-workflow-examples: update content
Browse files Browse the repository at this point in the history
conduit still does not compile, and a note
was added about that.

Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch committed Aug 9, 2024
1 parent a6a2e14 commit b00ccf1
Show file tree
Hide file tree
Showing 17 changed files with 392 additions and 302 deletions.
Original file line number Diff line number Diff line change
@@ -1,73 +1,72 @@
**WARNING**
# Flux Workflow Examples

This repository has been archived. It is no longer maintained and it is
likely the examples do not work or are no longer good or suggested
examples.

Please look elswhere for examples.

**Flux Workflow Examples**
This contents used to be hosted at [flux-framework/flux-workflow-examples](https://github.com/flux-framework/flux-workflow-examples) and has been moved here for annual updates paired with the Flux Tutorials.

The examples contained here demonstrate and explain some simple use-cases with Flux,
and make use of Flux's command-line interface (CLI), Flux's C library,
and the Python and Lua bindings to the C library.

**Requirements**
## Requirements

The examples assume that you have installed:

1. A recent version of Flux

2. Python 3.6+

3. Lua 5.1+

**_1. [CLI: Job Submission](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-submit-cli)_**
You can also use an interactive container locally, binding this directory to the container:

```bash
docker run -it -v $(pwd):/home/fluxuser/flux-workflow-examples fluxrm/flux-sched:jammy
cd /home/fluxuser/flux-workflow-examples/
```

**_1. [CLI: Job Submission](job-submit-cli)_**

Launch a flux instance and schedule/launch compute and io-forwarding jobs on
separate nodes using the CLI

**_2. [Python: Job Submission](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-submit-api)_**
**_2. [Python: Job Submission](job-submit-api)_**

Schedule/launch compute and io-forwarding jobs on separate nodes using the Python bindings

**_3. [Python: Job Submit/Wait](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-submit-wait)_**
**_3. [Python: Job Submit/Wait](job-submit-wait)_**

Submit jobs and wait for them to complete using the Flux Python bindings

**_4. [Python: Asynchronous Bulk Job Submission](https://github.com/flux-framework/flux-workflow-examples/tree/master/async-bulk-job-submit)_**
**_4. [Python: Asynchronous Bulk Job Submission](async-bulk-job-submit)_**

Asynchronously submit jobspec files from a directory and wait for them to complete in any order

**_5. [Python: Tracking Job Status and Events](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-status-control)_**
**_5. [Python: Tracking Job Status and Events](job-status-control)_**

Submit job bundles, get event updates, and wait until all jobs complete

**_6. [Python: Job Cancellation](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-cancel)_**
**_6. [Python: Job Cancellation](job-cancel)_**

Cancel a running job

**_7. [Lua: Use Events](https://github.com/flux-framework/flux-workflow-examples/tree/master/synchronize-events)_**
**_7. [Lua: Use Events](synchronize-events)_**

Use events to synchronize compute and io-forwarding jobs running on separate
nodes

**_8. [Python: Simple KVS Example](https://github.com/flux-framework/flux-workflow-examples/tree/master/kvs-python-bindings)_**
**_8. [Python: Simple KVS Example](kvs-python-bindings)_**

Use KVS Python interfaces to store user data into KVS

**_9. [CLI/Lua: Job Ensemble Submitted with a New Flux Instance](https://github.com/flux-framework/flux-workflow-examples/tree/master/job-ensemble)_**
**_9. [CLI/Lua: Job Ensemble Submitted with a New Flux Instance](job-ensemble)_**

Submit job bundles, print live job events, and exit when all jobs are complete

**_10. [CLI: Hierarchical Launching](https://github.com/flux-framework/flux-workflow-examples/tree/master/hierarchical-launching)_**
**_10. [CLI: Hierarchical Launching](hierarchical-launching)_**

Launch a large number of sleep 0 jobs

**_11. [C/Lua: Use a Flux Comms Module](https://github.com/flux-framework/flux-workflow-examples/tree/master/comms-module)_**
**_11. [C/Lua: Use a Flux Comms Module](comms-module)_**

Use a Flux Comms Module to communicate with job elements

**_12. [C/Python: A Data Conduit Strategy](https://github.com/flux-framework/flux-workflow-examples/tree/master/data-conduit)_**
**_12. [C/Python: A Data Conduit Strategy](data-conduit)_**

Attach to a job that receives OS time data from compute jobs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
## Python Asynchronous Bulk Job Submission
# Python Asynchronous Bulk Job Submission

Parts (a) and (b) demonstrate different implementations of the same basic use-case---submitting
large numbers of jobs to Flux. For simplicity, in these examples all of the jobs are identical.
Parts (a) and (b) demonstrate different implementations of the same basic use-case---submitting large numbers of jobs to Flux. For simplicity, in these examples all of the jobs are identical.

In part (a), we use the `flux.job.submit_async` and `flux.job.wait` functions to submit jobs and wait for them.
In part (b), we use the `FluxExecutor` class, which offers a higher-level interface. It is important to note that
Expand All @@ -10,49 +9,59 @@ The executor's futures fulfill in the background and callbacks added to the futu
be invoked by different threads; the `submit_async` futures do not fulfill in the background, callbacks are always
invoked by the same thread that added them, and sharing the futures among threads is not supported.

### Setup - Downloading the Files
## Setup - Downloading the Files

If you haven't already, download the files and change your working directory:

```
$ git clone https://github.com/flux-framework/flux-workflow-examples.git
```bash
$ cd flux-workflow-examples/async-bulk-job-submit
```

### Part (a) - Using `submit_async`
## Part (a) - Using `submit_async`

#### Description: Asynchronously submit jobspec files from a directory and wait for them to complete in any order
### Description: Asynchronously submit jobspec files from a directory and wait for them to complete in any order

1. Allocate three nodes from a resource manager:

`salloc -N3 -ppdebug`
```bash
salloc -N3 -ppdebug
```

2. Make a **jobs** directory:

`mkdir jobs`
```bash
mkdir /tmp/jobs
```

3. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory:
3. If you are running Slurm, launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory:

`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out`
```bash
srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out
```

4. Store the jobspec of a `sleep 0` job in the **jobs** directory:

`flux mini run --dry-run -n1 sleep 0 > jobs/0.json`
```bash
flux run --dry-run -n1 sleep 0 > /tmp/jobs/0.json
```

5. Copy the jobspec of **job0** 1024 times to create a directory of 1025 `sleep 0` jobs:

``for i in `seq 1 1024`; do cp jobs/0.json jobs/${i}.json; done``
```bash
for i in `seq 1 1024`; do cp /tmp/jobs/0.json /tmp/jobs/${i}.json; done
```

6. Run the **bulksubmit.py** script and pass all jobspec in the **jobs** directory as an argument with a shell glob `jobs/*.json`:

`./bulksubmit.py jobs/*.json`

```bash
./bulksubmit.py /tmp/jobs/*.json
```
```console
bulksubmit: Starting...
bulksubmit: submitted 1025 jobs in 3.04s. 337.09job/s
bulksubmit: First job finished in about 3.089s
|██████████████████████████████████████████████████████████| 100.0% (29.4 job/s)
bulksubmit: Ran 1025 jobs in 34.9s. 29.4 job/s
bulksubmit: submitted 1025 jobs in 0.43s. 2392.93job/s
bulksubmit: First job finished in about 0.521s
|██████████████████████████████████████████████████████████| 100.0% (274.3 job/s)
bulksubmit: Ran 1025 jobs in 3.7s. 274.3 job/s
```

### Notes to Part (a)
Expand All @@ -65,7 +74,7 @@ bulksubmit: Ran 1025 jobs in 34.9s. 29.4 job/s

```python
if h.reactor_run() < 0:
h.fatal_error("reactor start failed")
h.fatal_error("reactor start failed")
```

The reactor will return automatically when there are no more outstanding RPC responses, i.e., all jobs have been submitted.
Expand All @@ -81,19 +90,24 @@ If continuing from part (a), skip to step 3.

1. Allocate three nodes from a resource manager:

`salloc -N3 -ppdebug`
```bash
salloc -N3 -ppdebug
```

2. Launch a Flux instance on the current allocation by running `flux start` once per node, redirecting log messages to the file `out` in the current directory:

`srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out`
```bash
srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out
```

3. Run the **bulksubmit_executor.py** script and pass the command (`/bin/sleep 0` in this example) and the number of times to run it (default is 100):

`./bulksubmit_executor.py -n200 /bin/sleep 0`

```bash
./bulksubmit_executor.py -n200 /bin/sleep 0
```
bulksubmit_executor: submitted 200 jobs in 0.45s. 441.15job/s
bulksubmit_executor: First job finished in about 1.035s
|██████████████████████████████████████████████████████████| 100.0% (24.9 job/s)
bulksubmit_executor: Ran 200 jobs in 8.2s. 24.4 job/s
```console
bulksubmit_executor: submitted 200 jobs in 0.18s. 1087.27job/s
bulksubmit_executor: First job finished in about 0.248s
|██████████████████████████████████████████████████████████| 100.0% (229.8 job/s)
bulksubmit_executor: Ran 200 jobs in 1.0s. 199.6 job/s
```
Original file line number Diff line number Diff line change
@@ -1,36 +1,93 @@
### Using a Flux Comms Module
# Using a Flux Comms Module

#### Description: Use a Flux comms module to communicate with job elements
## Description: Use a Flux comms module to communicate with job elements

##### Setup
### Setup

If you haven't already, download the files and change your working directory:

```
$ git clone https://github.com/flux-framework/flux-workflow-examples.git
```bash
$ cd flux-workflow-examples/comms-module
```

##### Execution
### Execution

If you need to get an allocation on Slurm:

1. `salloc -N3 -ppdebug`
```bash
salloc -N3 -ppdebug
```

2. Point to `flux-core`'s `pkgconfig` directory:
Point to `flux-core`'s `pkgconfig` directory:

| Shell | Command |
| ----- | ---------- |
| tcsh | `setenv PKG_CONFIG_PATH <FLUX_INSTALL_PATH>/lib/pkgconfig` |
| bash/zsh | `export PKG_CONFIG_PATH='<FLUX_INSTALL_PATH>/lib/pkgconfig'` |

3. `make`
This might look like this in the container:

```bash
export PKG_CONFIG_PATH=/usr/lib/pkgconfig
```

4. Add the directory of the modules to `FLUX_MODULE_PATH`; if the module was
Then build the module (if you don't have permission, copy to /tmp)

```bash
cp -R ./comms-module /tmp/comms-module
cd /tmp/comms-module
make
```

Add the directory of the modules to `FLUX_MODULE_PATH`; if the module was
built in the current dir:

`export FLUX_MODULE_PATH=${FLUX_MODULE_PATH}:$(pwd)`
```bash
flux module load ioapp.so
flux module load capp.so
export FLUX_MODULE_PATH=${FLUX_MODULE_PATH}:$(pwd)
```

5. `srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out`
Now let's try it! If you need to run flux start under Slurm:

6. `flux submit -N 2 -n 2 ./compute.lua 120`
```bash
srun --pty --mpi=none -N3 flux start -o,-S,log-filename=out
```

Try running flux with the module on the path.

```bash
flux run -N 1 -n 2 ./compute.lua 120
flux run -N 1 -n 2 ./io-forwarding.lua 120
```
Notice that the module is loaded (at the bottom):

7. `flux submit -N 1 -n 1 ./io-forwarding.lua 120`
```console
Try `flux-module load --help' for more information.
Module Idle S Sendq Recvq Service
heartbeat 1 R 0 0
resource 0 R 0 0
job-ingest 0 R 0 0
kvs-watch 0 R 0 0
sched-fluxion-resource 0 R 0 0
cron idle R 0 0
barrier idle R 0 0
job-exec 0 R 0 0
job-list idle R 0 0
kvs 0 R 0 0
content-sqlite 0 R 0 0 content-backing
job-info 0 R 0 0
job-manager 0 R 0 0
sched-fluxion-qmanager 0 R 0 0 sched
content 0 R 0 0
connector-local 0 R 0 0 1002-shell-f3Lv2Zd3tj,1002-shell-f3N2WmZB5H
ioapp 83 R 0 0
Block until we hear go message from the an io forwarder
```

If you run them together, they work together:

```bash
flux submit -N 1 -n 2 ./compute.lua 120
flux run -N 1 -n 2 ./io-forwarding.lua 120
```
Original file line number Diff line number Diff line change
@@ -1,29 +1,47 @@
## A Data Conduit Strategy
# A Data Conduit Strategy

### Description: Use a data stream to send packets through
**Note that this module script does not compile and needs an update**

#### Setup

## Description: Use a data stream to send packets through

### Setup

If you haven't already, download the files and change your working directory:

```
$ git clone https://github.com/flux-framework/flux-workflow-examples.git
```bash
$ cd flux-workflow-examples/data-conduit
```

#### Execution
### Execution

1. Allocate three nodes from a resource manager:
If you are using Slurm, allocate three nodes from a resource manager:

`salloc -N3 -ppdebug`
```bash
salloc -N3 -ppdebug
```

2. Point to `flux-core`'s `pkgconfig` directory:
Point to `flux-core`'s `pkgconfig` directory:

| Shell | Command |
| ----- | ---------- |
| tcsh | `setenv PKG_CONFIG_PATH <FLUX_INSTALL_PATH>/lib/pkgconfig` |
| bash/zsh | `export PKG_CONFIG_PATH='<FLUX_INSTALL_PATH>/lib/pkgconfig'` |

This might look like this in the container:

```bash
export PKG_CONFIG_PATH=/usr/lib/pkgconfig
```

Then build the module (if you don't have permission, copy to /tmp)

```bash
cp -R ./data-conduit /tmp/data-conduit
cd /tmp/data-conduit
make
```

3. `make`

4. Add the directory of the modules to `FLUX_MODULE_PATH`, if the module was built in the current directory:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ static struct conduit_ctx *getctx (flux_t *h)
return ctx;
}

/* Foward the received JSON string to the datastore.py */
/* Forward the received JSON string to the datastore.py */
static int conduit_send (flux_t *h, const char *json_str)
{
int rc = -1;
Expand Down
Loading

0 comments on commit b00ccf1

Please sign in to comment.