Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for Corona to the llnl-cluster system #519

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ilumsden
Copy link
Collaborator

@ilumsden ilumsden commented Jan 6, 2025

Description

This PR modifies the llnl-cluster system to also support Corona. This required a few Corona-specific tweaks to the LlnlCluster class, mainly because, unlike the other clusters associated with this system, Corona has AMD GPUs. Most changes relate to adding functions for injecting ROCm into the packages and compilers.

Dependencies: None

Fixes issue(s): None

Type of Change

  • { } Adding a system, benchmark, or experiment
  • {X} Modifying an existing system, benchmark, or experiment
  • { } Documentation update
  • { } Build/CI update
  • { } Benchpark core functionality

Checklist:

If adding/modifying a system:

  • {X} Create a new directory for the system and a new system.py file
  • { } Add a new dry run unit test in .github/workflows
  • { } System appears in System Specifications table in docs catalogue section

If adding/modifying a benchpark:

  • { } Add a new application.py and (maybe) package.py under a new directory
    for this benchmark
  • { } Configure an experiment
  • { } Benchmark appears in Benchmarks and Experiments table in docs catalogue
    section

If adding/modifying a experiment:

  • { } Extend experiment.py under existing directory for specific benchmark
  • { } Define a single node and multi-node experiments

If adding/modifying core functionality:

  • { } Update docs
  • { } Update .github/workflows and .gitlab/ci unit tests (if needed)

@ilumsden ilumsden self-assigned this Jan 6, 2025
@pearce8
Copy link
Collaborator

pearce8 commented Jan 6, 2025

@ilumsden thanks for this! There may be 2 solutions:

  1. corona could be one of the llnl-cluster
  2. it might need to be its own system.py

I think this largely depends on how we are implementing "provides" and "requires". Since corona has GPUs/rocm, it would need to "provide" rocm. @becker33, it would be good to have you weigh in here.

@pearce8 pearce8 added the question Further information is requested label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants