Skip to content

Commit

Permalink
job preparation sequence diagram + some fixs
Browse files Browse the repository at this point in the history
  • Loading branch information
telliere committed Mar 28, 2024
1 parent f06f4da commit 5be9d8b
Show file tree
Hide file tree
Showing 5 changed files with 97 additions and 11 deletions.
3 changes: 1 addition & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
Expand Down Expand Up @@ -31,4 +30,4 @@ repos:
rev: v4.0.0-alpha.8
hooks:
- id: prettier
files: \.(js|ts|jsx|tsx|css|less|html|json|markdown|md|yaml|yml)$
files: \.(js|ts|jsx|tsx|css|less|html|json|markdown|yaml|yml)$
4 changes: 2 additions & 2 deletions docs/architecture.md → docs/architecture/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ CN4SBATCH <--"HTTPS"--> Vault
HPCSCDPB <--"SSH (As user - Data & Info files)"--> LN
HPCSCCPB <--"SSH (As user - Container image & Info files)"--> LN
HPCSCJPB --"SSH (As user - SBATCH file & CLI Call to SBATCH)"--> LN
HPCSCJPB --"SSH (As user - SBATCH file & CLI Call to SBATCH)"--> LN
LN --"SSH (As user - Info files)"--> HPCSCJPB
```

This diagram doesn't show the HTTPS requests from client/compute node to HPCS Server used to register the agents since this behaviour is a practical workaround. See section "Limitations" in [HPCS/README.md](https://github.com/CSCfi/HPCS/blob/main/README.md#limitations) for more information.
This diagram doesn't show the HTTPS requests from client/compute node to HPCS Server used to register the agents since this behaviour is a practical workaround.
15 changes: 10 additions & 5 deletions docs/architecture/container_preparation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ This step consist in using an original OCI image to prepare it, encrypt it and s

```mermaid
sequenceDiagram
actor User
User -->> Container Preparation container: spawns using docker-compose
Container Preparation container -->> Spire Agent: spawns using `spawn_agent.py`
Spire Agent ->> Spire Server: Runs node attestation
Expand All @@ -29,22 +30,24 @@ sequenceDiagram
HPCS Server ->> Container Preparation container: SpiffeID & role to access the container, path to the secret
Container Preparation container ->> Container Preparation container: Parse info file based on previous steps
Container Preparation container ->> Supercomputer: Ship encrypted container
Supercomputer ->> Container Preparation container:
Supercomputer ->> Container Preparation container: '
Container Preparation container ->> Supercomputer: Ship info file
Supercomputer ->> Container Preparation container:
Container Preparation container -->> Spire Agent: Kills
Spire Agent -->> Container Preparation container:
Spire Agent -->> Container Preparation container: Dies
Spire Agent -->> Container Preparation container: Dies
Container Preparation container -->> User: Finishes
```


## Sequence diagram of the container's preparation (without shipping)

### Image is prepared and then encrypted (Encryption at rest)

This step is currently (3/2024) used to encrypt the container. It does not require changes on LUMI to work.

```mermaid
sequenceDiagram
actor User
User -->>HPCS Client: spawns using `python3 prepare_container.py [OPTIONS]`
HPCS Client -->> Docker Client: spawns
HPCS Client ->> HPCS Client: Create prepared Dockerfile
Expand All @@ -59,11 +62,13 @@ sequenceDiagram
HPCS Client ->> HPCS Client: Encrypt image file
```


### Image is prepared and SIF encrypted

When HPC nodes support encrypted containers, this process can be used.

```mermaid
sequenceDiagram
actor User
User -->>HPCS Client: spawns using `python3 prepare_container.py [OPTIONS]`
HPCS Client -->> Docker Client: spawns
HPCS Client ->> HPCS Client: Create prepared Dockerfile
Expand All @@ -75,4 +80,4 @@ sequenceDiagram
Docker Client -->> Build-Env: Spawns
Build-Env ->> Build-Env: Build final prepared and encrypted SIF image
Build-Env ->> HPCS Client: Returns final prepared and encrypted SIF image
```
```
5 changes: 3 additions & 2 deletions docs/architecture/data_preparation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ This step consists in using an input directory, encrypt it and ship it to the su

```mermaid
sequenceDiagram
actor User
User -->> Data Preparation container: spawns using docker-compose
Data Preparation container -->> Spire Agent: spawns using `spawn_agent.py`
Spire Agent ->> Spire Server: Runs node attestation
Expand Down Expand Up @@ -34,6 +35,6 @@ sequenceDiagram
Supercomputer ->> Data Preparation container:
Data Preparation container -->> Spire Agent: Kills
Spire Agent -->> Data Preparation container:
Spire Agent -->> Data Preparation container: Dies
Spire Agent -->> Data Preparation container: Dies
Data Preparation container -->> User: Finishes
```
```
81 changes: 81 additions & 0 deletions docs/architecture/job_preparation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Job preparation

This step consists in the preparation of the secure job, followed by its execution. It requires two info files (one for the data, one for the secured container) and more settings about the runtime (arguments, parameters for the singularity container ...).

## Sequence diagram of this step

```mermaid
sequenceDiagram
actor User
participant Job Preparation container
participant Login Node
participant Scheduler
User -->> Job Preparation container: spawns using docker-compose
Job Preparation container ->> Login Node: Initiate SSH Connection
rect rgb(191, 223, 255)
note right of User: Job preparation
Job Preparation container ->> Login Node: SCP Data's info file
Login Node ->> Job Preparation container: Info file
Job Preparation container ->> Job Preparation container: Parse info from info file
Job Preparation container ->> Login Node: SCP Container image's info file
Login Node ->> Job Preparation container: Info file
Job Preparation container ->> Job Preparation container: Parse info from info file
Job Preparation container ->> Job Preparation container: Generate SBATCH file from template based on info gathered
Job Preparation container ->> Login Node: Copy SBATCH File and HPCS Configuration file
Login Node ->> Job Preparation container:
Job Preparation container ->> Job Preparation container: Generate keypair for output data
Job Preparation container ->> Login Node: Copy encryption key
Login Node ->> Job Preparation container:
end
rect rgb(191, 223, 255)
note right of User: Job runtime
Job Preparation container ->> Login Node: SSH Execute "sbatch SBATCHFILE"
Login Node ->>+ Scheduler: sbatch SBATCHFILE
Scheduler ->> Login Node: Job created + Job id
Login Node ->> Job Preparation container: Job created + Job id
Job Preparation container ->> Job Preparation container: Follows job output or job status
activate Job Preparation container
Scheduler ->> Scheduler: Scheduling job
activate Scheduler
deactivate Scheduler
Scheduler ->> Compute node: Elect node - Execute SBATCHFILE
Compute node ->> Compute node: Clone HPCS Github / Download age and gocryptfs binaries
Compute node -->> Spire Agent: spawns using `spawn_agent.py`
Spire Agent ->> Spire Server: Runs node attestation
Spire Server ->> Spire Agent: Attests node, provide SVIDs for linked identities
Compute node ->> Spire Agent: Fetches API to get an SVID
Spire Agent ->> Compute node: Provides SVID
Compute node ->> Vault: Log-in using SVID
Vault ->> Compute node: Returns an authentication token (read only on container key's path)
Compute node ->> Vault: Read container's key using authentication token
Vault ->> Compute node: Returns container's key
Compute node ->> Compute node: Decrypt container image
Compute node ->> Compute node: Setup secure environment for runtime (Encrypted volumes, gather flags etc)
Compute node ->> Spire Agent: Fetches API to get an SVID
Spire Agent ->> Compute node: Provides SVID
Compute node ->> Compute node: Export SVID and data secret path in a variable
Compute node -->> Application container: spawns using `singularity run`
Application container ->> Vault: Log-in using SVID
Vault ->> Application container: Returns an authentication token (read only on data key's path)
Application container ->> Vault: Read data's key using authentication token
Vault ->> Application container: Returns data's key
Application container ->> Application container: Decrypt data using key
Application container ->> Application container: Runs input scripts
Application container ->> Application container: Application runs
Application container ->> Application container: Runs output scripts
Application container ->> Application container: Encrypt output directory
Application container -->> Compute node: Finishes
Compute node -->> Spire Agent: Kills
Spire Agent -->> Compute node:
Spire Agent -->> Compute node: Dies
Compute node ->> Scheduler: Becomes available
deactivate Job Preparation container
end
Job Preparation container ->> Login Node: Close SSH connection
Login Node ->> Job Preparation container:
Login Node ->> Job Preparation container: Close SSH connection
Job Preparation container -->> User: Finishes
```

0 comments on commit 5be9d8b

Please sign in to comment.