Skip to content

Latest commit

 

History

History
930 lines (717 loc) · 23 KB

README.md

File metadata and controls

930 lines (717 loc) · 23 KB

dagu-logo

Dagu

Dagu is a powerful Cron alternative that comes with a Web UI. It allows you to define dependencies between commands in a declarative YAML Format. Additionally, Dagu natively supports running Docker containers, making HTTP requests, and executing commands over SSH. Dagu was designed to be easy to use, self-contained, and require no coding, making it ideal for small projects.

Table of Contents

Why Dagu?

Dagu is a modern workflow engine that combines simplicity with power, designed for developers who need reliable automation without the overhead. Here's what makes Dagu stand out:

  • Language Agnostic: Run any command or script regardless of programming language. Whether you're working with Python, Node.js, Bash, or any other language, Dagu seamlessly integrates with your existing tools and scripts.

  • Local-First Architecture: Deploy and run workflows directly on your machine without external dependencies. This local-first approach ensures complete control over your automation while maintaining the flexibility to scale to distributed environments when needed.

  • Zero Configuration: Get started in minutes with minimal setup. Dagu uses simple YAML files to define workflows, eliminating the need for complex configurations or infrastructure setup.

  • Built for Developers: Designed with software engineers in mind, Dagu provides powerful features like dependency management, retry logic, and parallel execution while maintaining a clean, intuitive interface.

  • Cloud Native Ready: While running perfectly on local environments, Dagu is built to seamlessly integrate with modern cloud infrastructure when you need to scale.

Core Features

  • Workflow Management
    • Declarative YAML definitions
    • Dependency management
    • Parallel execution
    • Sub-workflows
    • Conditional execution with regex
    • Timeouts and automatic retries
  • Execution & Integration
    • Native Docker support
    • SSH command execution
    • HTTP requests
    • JSON processing
    • Email notifications
  • Operations
    • Web UI for monitoring
    • Real-time logs
    • Execution history
    • Flexible scheduling
    • Environment variables
    • Automatic logging

Common Use Cases

  • Data Processing
  • Scheduled Tasks
  • Media Processing
  • CI/CD Automation
  • ETL Pipelines
  • Agentic Workflows

Community

Installation

Dagu can be installed in multiple ways, such as using Homebrew or downloading a single binary from GitHub releases.

Via Bash script

curl -L https://raw.githubusercontent.com/dagu-org/dagu/main/scripts/installer.sh | bash

Via GitHub Releases Page

Download the latest binary from the Releases page and place it in your $PATH (e.g. /usr/local/bin).

Via Homebrew (macOS)

brew install dagu-org/brew/dagu

Upgrade to the latest version:

brew upgrade dagu-org/brew/dagu

Via Docker

docker run \
--rm \
-p 8080:8080 \
-v ~/.config/dagu:/config \
-e DAGU_TZ=`ls -l /etc/localtime | awk -F'/zoneinfo/' '{print $2}'` \
ghcr.io/dagu-org/dagu:latest dagu start-all

Note: The environment variable DAGU_TZ is the timezone for the scheduler and server. You can set it to your local timezone (e.g. America/New_York).

See Environment variables to configure those default directories.

Quick Start

See the Quick Start Guide to create and execute your first DAG!

Building from Source

Dagu can be built and run locally from source.

Prerequisites

Make sure you have the following installed on your system:

Steps to Build Locally

1. Clone the repository

  • Clone the repository to your local machine using Git.
    git clone https://github.com/dagu-org/dagu.git
    cd dagu

2. Build the UI

  • Navigate to the ui directory.
    cd ui
  • Install / Update Dependencies
    yarn
  • Go back to the root directory.
    cd ..
  • Build the UI
    make build-ui

3. Build the Binary

  • Build the binary
    make build-bin
    This produces the dagu binary in the .local/bin directory.

Run Locally from Source

For a quick test of both server, scheduler, and UI:

# Runs "dagu start-all" with the `go run` command
make run

Once the server is running, visit http://127.0.0.1:8080 to see the Web UI.

Continue with the Quick Start Guide to create and execute your first DAG!

Quick Start Guide

1. Launch the Web UI

Start the server and scheduler with the command dagu start-all and browse to http://127.0.0.1:8080 to explore the Web UI.

2. Create a New DAG

Navigate to the DAG List page by clicking the menu in the left panel of the Web UI. Then create a DAG by clicking the NEW button at the top of the page. Enter example in the dialog.

Note: DAG (YAML) files will be placed in ~/.config/dagu/dags by default. See Configuration Options for more details.

3. Edit the DAG

Go to the SPEC Tab and hit the Edit button. Copy & Paste the following example and click the Save button.

Example:

schedule: "* * * * *" # Run the DAG every minute
params:
  - NAME: "Dagu"
steps:
  - name: Hello world
    command: echo Hello $NAME
  - name: Done
    command: echo Done!
    depends: Hello world

4. Execute the DAG

You can execute the example by pressing the Start button. You can see "Hello Dagu" in the log page in the Web UI.

Usage / Command Line Interface

# Runs the DAG
dagu start <file or DAG name>

# Runs the DAG with named parameters
dagu start <file or DAG name> [-- <key>=<value> ...]

# Runs the DAG with positional parameters
dagu start <file or DAG name> [-- value1 value2 ...]

# Displays the current status of the DAG
dagu status <file or DAG name>

# Re-runs the specified DAG run
dagu retry --req=<request-id> <file or DAG name>

# Stops the DAG execution
dagu stop <file or DAG name>

# Restarts the current running DAG
dagu restart <file or DAG name>

# Dry-runs the DAG
dagu dry <file or DAG name> [-- <key>=<value> ...]

# Launches both the web UI server and scheduler process
dagu start-all [--host=<host>] [--port=<port>] [--dags=<path to directory>]

# Launches the Dagu web UI server
dagu server [--host=<host>] [--port=<port>] [--dags=<path to directory>]

# Starts the scheduler process
dagu scheduler [--dags=<path to directory>]

# Shows the current binary version
dagu version

Example DAG

Minimal Examples

A simple example with a named parameter:

params:
  - NAME: "Dagu"

steps:
  - name: Hello world
    command: echo Hello $NAME
  - name: Done
    command: echo Done!
    depends:
      - Hello world

Using a pipe:

steps:
  - name: step 1
    command: echo hello world | xargs echo

Specifying a shell:

steps:
  - name: step 1
    command: echo hello world | xargs echo
    shell: bash # The default shell is `$SHELL` or `sh`.

Named Parameters

You can define named parameters in the DAG file and override them when running the DAG.

# Default named parameters
params:
  NAME: "Dagu"
  AGE: 30

steps:
  - name: Hello world
    command: echo Hello $NAME
  - name: Done
    command: echo Done!
    depends: Hello world

Run the DAG with custom parameters:

dagu start my_dag -- NAME=John AGE=40

Positional Parameters

You can define positional parameters in the DAG file and override them when running the DAG.

# Default positional parameters
params: input.csv output.csv 60  # Default values for $1, $2, and $3

steps:
  # Using positional parameters
  - name: data processing
    command: python
    script: |
      import sys
      import pandas as pd
      
      input_file = "$1"    # First parameter
      output_file = "$2"   # Second parameter
      timeout = "$3"       # Third parameter
      
      print(f"Processing {input_file} -> {output_file} with timeout {timeout}s")
      # Add your processing logic here

Run the DAG with custom parameters:

dagu start my_dag -- input.csv output.csv 120

Conditional DAG

You can define conditions to run a step based on the output of a command.

steps:
  - name: monthly task
    command: monthly.sh
    preconditions:
      - condition: "`date '+%d'`"
        expected: "re:0[1-9]" # Run only if the day is between 01 and 09

Script Execution

You can run a script using the script field.

steps:
  # Python script example
  - name: data analysis
    command: python
    script: |
      import json
      import sys
      
      data = {'count': 100, 'status': 'ok'}
      print(json.dumps(data))
      sys.stderr.write('Processing complete\n')
    output: RESULT
    stdout: /tmp/analysis.log
    stderr: /tmp/analysis.error

  # Shell script with multiple commands
  - name: cleanup
    command: bash
    script: |
      #!/bin/bash
      echo "Starting cleanup..."
      
      # Remove old files
      find /tmp -name "*.tmp" -mtime +7 -exec rm {} \;
      
      # Archive logs
      cd /var/log
      tar -czf archive.tar.gz *.log
      
      echo "Cleanup complete"
    depends: data analysis

Variable Passing

You can pass the output of one step to another step using the output field.

steps:
  # Basic output capture
  - name: generate id
    command: echo "ABC123"
    output: REQUEST_ID

  - name: use id
    command: echo "Processing request ${REQUEST_ID}"
    depends: generate id

# Capture JSON output
steps:
  - name: get config
    command: |
      echo '{"port": 8080, "host": "localhost"}'
    output: CONFIG

  - name: start server
    command: echo "Starting server at ${CONFIG.host}:${CONFIG.port}"
    depends: get config

Scheduling

You can specify flexible schedules using the cron format.

schedule: "5 4 * * *" # Run at 04:05.

steps:
  - name: scheduled job
    command: job.sh

Or you can set multiple schedules.

schedule:
  - "30 7 * * *" # Run at 7:30
  - "0 20 * * *" # Also run at 20:00

steps:
  - name: scheduled job
    command: job.sh

If you want to start and stop a long-running process on a fixed schedule, you can define start and stop times:

schedule:
  start: "0 8 * * *" # starts at 8:00
  stop: "0 13 * * *" # stops at 13:00
steps:
  - name: scheduled job
    command: job.sh

Calling a sub-DAG

You can call another DAG from a parent DAG.

steps:
  - name: parent
    run: sub-dag
    output: OUT
  - name: use output
    command: echo ${OUT.outputs.result}
    depends: parent

The sub-DAG sub-dag.yaml:

steps:
  - name: sub-dag
    command: echo "Hello from sub-dag"
    output: result

THe parent DAG will call the sub-DAG and write the output to the log (stdout). The output will be Hello from sub-dag.

Running a docker image

You can run a docker image as a step:

steps:
  - name: hello
    executor:
      type: docker
      config:
        image: alpine
        autoRemove: true
    command: echo "hello"

Environment Variables

You can define environment variables and use them in the DAG.

env:
  - DATA_DIR: ${HOME}/data
  - PROCESS_DATE: "`date '+%Y-%m-%d'`"

steps:
  - name: process logs
    command: python process.py
    dir: ${DATA_DIR}
    preconditions:
      - "test -f ${DATA_DIR}/logs_${PROCESS_DATE}.txt" # Check if the file exists

Notifications on Failure or Success

You can send notifications on failure in various ways.

env:
  - SLACK_WEBHOOK_URL: "https://hooks.slack.com/services/XXXXX/YYYYY/ZZZZZ"

dotenv:
  - .env

smtp:
  host: $SMTP_HOST
  port: "587"
  username: $SMTP_USERNAME
  password: $SMTP_PASSWORD

handlerOn:
  failure:
    command: |
      curl -X POST -H 'Content-type: application/json' \
      --data '{"text":"DAG Failed ($DAG_NAME")}' \
      ${SLACK_WEBHOOK_URL}

steps:
  - name: critical process
    command: important_job.sh
    retryPolicy:
      limit: 3
      intervalSec: 60
    mailOn:
      failure: true # Send an email on failure

If you want to set it globally, you can create ~/.config/dagu/base.yaml and define the common configurations across all DAGs.

smtp:
  host: $SMTP_HOST
  port: "587"
  username: $SMTP_USERNAME
  password: $SMTP_PASSWORD

mailOn:
  failure: true                      
  success: true                      

You can also use mail executor to send notifications.

params:
  - RECIPIENT_NAME: XXX
  - RECIPIENT_EMAIL: [email protected]
  - MESSAGE: "Hello [RECIPIENT_NAME]"

steps:
  - name: step1
    executor:
      type: mail
      config:
        to: $RECIPIENT_EMAIL
        from: [email protected]
        subject: "Hello [RECIPIENT_NAME]"
        message: $MESSAGE
          

HTTP Request and Notifications

You can make HTTP requests and send notifications.

dotenv:
  - .env

smtp:
  host: $SMTP_HOST
  port: "587"
  username: $SMTP_USERNAME
  password: $SMTP_PASSWORD

steps:
  - name: fetch data
    executor:
      type: http
      config:
        timeout: 10
    command: GET https://api.example.com/data
    output: API_RESPONSE

  - name: send notification
    executor:
      type: mail
      config:
        to: [email protected]
        from: [email protected]
        subject: "Data Processing Complete"
        message: |
          Process completed successfully.
          Response: ${API_RESPONSE}

    depends: fetch data

Execute commands over SSH

You can execute commands over SSH.

steps:
  - name: backup
    executor:
      type: ssh
      config:
        user: admin
        ip: 192.168.1.100
        key: ~/.ssh/id_rsa
    command: tar -czf /backup/data.tar.gz /data

Advanced Preconditions

You can define complex conditions to run a step based on the output of a command.

steps:
  # Check multiple conditions
  - name: daily task
    command: process_data.sh
    preconditions:
      # Run only on weekdays
      - condition: "`date '+%u'`"
        expected: "re:[1-5]"
      # Run only if disk space > 20%
      - condition: "`df -h / | awk 'NR==2 {print $5}' | sed 's/%//'`"
        expected: "re:^[0-7][0-9]$|^[1-9]$"  # 0-79% used (meaning at least 20% free)
      # Check if input file exists
      - condition: "test -f input.csv"

  # Complex file check
  - name: process files
    command: batch_process.sh
    preconditions:
      - condition: "`find data/ -name '*.csv' | wc -l`"
        expected: "re:[1-9][0-9]*"  # At least one CSV file exists

Handling Various Execution Results

You can use continueOn to control when to fail or continue based on the exit code, output, or other conditions.

steps:
  # Basic error handling
  - name: process data
    command: python process.py
    continueOn:
      failure: true  # Continue on any failure
      skipped: true  # Continue if preconditions aren't met

  # Handle specific exit codes
  - name: data validation
    command: validate.sh
    continueOn:
      exitCode: [1, 2, 3]  # 1:No data, 2:Partial data, 3:Invalid format
      markSuccess: true    # Mark as success even with these codes

  # Output pattern matching
  - name: api request
    command: curl -s https://api.example.com/data
    continueOn:
      output:
        - "no records found"      # Exact match
        - "re:^Error: [45][0-9]"  # Regex match for HTTP errors
        - "rate limit exceeded"    # Another exact match

  # Complex pattern
  - name: database backup
    command: pg_dump database > backup.sql
    continueOn:
      exitCode: [0, 1]     # Accept specific exit codes
      output:              # Accept specific outputs
        - "re:0 rows affected"
        - "already exists"
      failure: false       # Don't continue on other failures
      markSuccess: true    # Mark as success if conditions match

  # Multiple conditions combined
  - name: data sync
    command: sync_data.sh
    continueOn:
      exitCode: [1]        # Exit code 1 is acceptable
      output:              # These outputs are acceptable
        - "no changes detected"
        - "re:synchronized [0-9]+ files"
      skipped: true       # OK if skipped due to preconditions
      markSuccess: true   # Mark as success in these cases

  # Error output handling
  - name: log processing
    command: process_logs.sh
    stderr: /tmp/process.err
    continueOn:
      output: 
        - "re:WARNING:.*"   # Continue on warnings
        - "no logs found"   # Continue if no logs
      exitCode: [0, 1, 2]   # Multiple acceptable exit codes
      failure: true         # Continue on other failures too

  # Application-specific status
  - name: app health check
    command: check_status.sh
    continueOn:
      output:
        - "re:STATUS:(DEGRADED|MAINTENANCE)"  # Accept specific statuses
        - "re:PERF:[0-9]{2,3}ms"             # Accept performance in range
      markSuccess: true                       # Mark these as success

JSON Processing Examples

You can use jq executor to process JSON data.

# Simple data extraction
steps:
  - name: extract value
    executor: jq
    command: .user.name    # Get user name from JSON
    script: |
      {
        "user": {
          "name": "John",
          "age": 30
        }
      }

# Output: "John"

# Transform array data
steps:
  - name: get users
    executor: jq
    command: '.users[] | {name: .name}'    # Extract name from each user
    script: |
      {
        "users": [
          {"name": "Alice", "age": 25},
          {"name": "Bob", "age": 30}
        ]
      }

# Output:
# {"name": "Alice"}
# {"name": "Bob"}

# Calculate and format
steps:
  - name: sum ages
    executor: jq
    command: '{total_age: ([.users[].age] | add)}'    # Sum all ages
    script: |
      {
        "users": [
          {"name": "Alice", "age": 25},
          {"name": "Bob", "age": 30}
        ]
      }

# Output: {"total_age": 55}

# Filter and count
steps:
  - name: count active
    executor: jq
    command: '[.users[] | select(.active == true)] | length'
    script: |
      {
        "users": [
          {"name": "Alice", "active": true},
          {"name": "Bob", "active": false},
          {"name": "Charlie", "active": true}
        ]
      }

# Output: 2

More examples can be found in the documentation.

Web UI

DAG Details

Real-time status, logs, and configuration for each DAG. Toggle graph orientation from the top-right corner.

example

Details-TD

DAGs

View all DAGs in one place with live status updates.

DAGs

Search

Search across all DAG definitions.

History

Execution History

Review past DAG executions and logs at a glance.

History

Log Viewer

Examine detailed step-level logs and outputs.

DAG Log

Running as a daemon

The easiest way to make sure the process is always running on your system is to create the script below and execute it every minute using cron (you don't need root account in this way):

#!/bin/bash
process="dagu start-all"
command="/usr/bin/dagu start-all"

if ps ax | grep -v grep | grep "$process" > /dev/null
then
    exit
else
    $command &
fi

exit

Contributing

We welcome new contributors! Check out our Contribution Guide for guidelines on how to get started.

Contributors

License

Dagu is released under the GNU GPLv3.