Adapting commands for use with Docker and AWS

Containerization is a solution to using VEBA on any system and portability for resources such as AWS or Google Cloud. Here is the guide for using these containers specifically with AWS.

Steps:

Set up AWS infrastructure
Create and register a job definition
Submit job definition

1. Set up AWS infrastructure

Out of scope for this tutorial but essentially you need to do the following:

Set up AWS EFS (Elastic File System) via Terraform to read/write/mount data
Compile database in EFS
Create compute environment
Create job queue linked to compute environment

2. Create and register a job definition

Once the job queue is properly set up, next is to create a job definition and then submit the job definition to the queue.

The preferred way to submit jobs with AWS Batch is using json files for the job definition through Fargate.

Here is a template you can use for a job definition.

This job definition pulls the jolespin/veba_preprocess Docker image and mounts EFS directories to volumes within the Docker container. The actual job runs the preprocess.py module of VEBA for a sample called S1.

{
  "jobDefinitionName": "preprocess__S1",
  "type": "container",
  "containerProperties": {
    "image": "jolespin/veba_preprocess:2.4.2",
    "command": [
      "preprocess.py",
      "-1",
      "/volumes/input/Fastq/S1_1.fastq.gz",
      "-2",
      "/volumes/input/Fastq/S1_2.fastq.gz",
      "-n",
      "1",
      "-o",
      "/volumes/output/veba_output/preprocess",
      "-p",
      "16"
      "-x",
      "/volumes/database/Contamination/chm13v2.0/chm13v2.0"
    ],
    "jobRoleArn": "arn:aws:iam::xxx:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::xxx:role/ecsTaskExecutionRole",
    "volumes": [
      {
        "name": "efs-volume-database",
        "efsVolumeConfiguration": {
          "fileSystemId": "fs-xxx",
          "transitEncryption": "ENABLED",
          "rootDirectory": "databases/veba/VDB_v8/"
        }
      },
      {
        "name": "efs-volume-input",
        "efsVolumeConfiguration": {
          "fileSystemId": "fs-xxx",
          "transitEncryption": "ENABLED",
          "rootDirectory": "path/to/efs/input/"
        }
      },
      {
        "name": "efs-volume-output",
        "efsVolumeConfiguration": {
          "fileSystemId": "fs-xxx",
          "transitEncryption": "ENABLED",
          "rootDirectory": "path/to/efs/output/"
        }
      }
    ],
    "mountPoints": [
    {
        "sourceVolume": "efs-volume-database",
        "containerPath": "/volumes/database",
        "readOnly": true
      },
      {
        "sourceVolume": "efs-volume-input",
        "containerPath": "/volumes/input",
        "readOnly": true
      },
      {
        "sourceVolume": "efs-volume-output",
        "containerPath": "/volumes/output",
        "readOnly": false
      }
    ],
    "environment": [],
    "ulimits": [],
    "resourceRequirements": [
      {
        "value": "16.0",
        "type": "VCPU"
      },
      {
        "value": "8000",
        "type": "MEMORY"
      }
    ],
    "networkConfiguration": {
      "assignPublicIp": "ENABLED"
    },
    "fargatePlatformConfiguration": {
      "platformVersion": "LATEST"
    },
    "ephemeralStorage": {
      "sizeInGiB": 40
    }
  },
  "platformCapabilities": [
    "FARGATE"
  ]
}

Now register the job definition:

FILE=/path/to/preprocess/S1.json
aws batch register-job-definition --cli-input-json file://${FILE}

3. Run Docker container

Next step is to submit the job to the queue.

QUEUE="some-aws-job-queue-name"
JOB_NAME="preprocess__S1"
aws batch submit-job --job-definition ${JOB_NAME} --job-name ${JOB_NAME} --job-queue ${QUEUE}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapting_commands_for_aws.md

adapting_commands_for_aws.md

Adapting commands for use with Docker and AWS

Steps:

1. Set up AWS infrastructure

2. Create and register a job definition

3. Run Docker container

Files

adapting_commands_for_aws.md

Latest commit

History

adapting_commands_for_aws.md

File metadata and controls

Adapting commands for use with Docker and AWS

Steps:

1. Set up AWS infrastructure

2. Create and register a job definition

3. Run Docker container