NVIDIA Terraform Kubernetes Modules

Infrastructure as code for GPU accelerated managed Kubernetes clusters. These scripts automate the deployment of GPU-Enabled Kubernetes clusters on various cloud service platforms.

Getting Started With Terraform

Terraform is an open-source infrastructure as code software tool that we will use to automate the deployment of Kubernetes clusters with the required add-ons to enable NVIDIA GPUs. This repository contains Terraform modules, which are sets of Terraform configuration files ready for deployment. The modules in this repository can be incorporated into existing Terraform-managed infrastructure, or used to set up new infrastructure from scratch. You can learn more about Terraform here.

You can download Terraform (CLI) here.

Support Matrix

NVIDIA offers support for Kubernetes through NVIDIA AI Enterprise. Refer to the product support matrix for supported managed Kubernetes platforms.

The Kubernetes clusters provisioned by the modules in this repository provide tested and certified versions of Kubernetes, the NVIDIA GPU operator, and the NVIDIA Driver.

If your application does not require a specific version of Kubernetes, we recommend using the latest available version. We also recommend you plan to upgrade your version of Kubernetes at least every 6 months.

Each CSP has its own end of life date for the versions of Kubernetes they support. For more information see:

Version	Release Date	Kubernetes Versions	NVIDIA GPU Operator	NVIDIA Data Center Driver	End of Life
0.2.0	August 2023	EKS - 1.26 GKE - 1.26 AKS - 1.26	23.3.2	535.54.03 (EKS & GKE)	EKS - June 2024 GKE - June 2024 AKS - March 2024
0.1.0	June 2023	EKS - 1.26 GKE - 1.26 AKS - 1.26	23.3.2	525.105.17	EKS - June 2024 GKE - June 2024 AKS - March 2024

Usage

Provision a GPU enabled Kubernetes Cluster

Create an EKS Cluster
Create an AKS Cluster
Create a GKE Cluster

Creating an EKS Cluster

Call the EKS module by adding this to an existing Terraform file:

module "nvidia-eks" {
  source       = "git::github.com/nvidia/nvidia-terraform-modules/eks" 
  cluster_name = "nvidia-eks"
}

See the EKS README for all available configuration options.

Creating an AKS Cluster

Call the AKS module by adding this to an existing Terraform file:

module "nvidia-aks" {
  source                 = "git::github.com/NVIDIA/nvidia-terraform-modules/aks" 
  cluster_name           = "nvidia-aks-cluster"
  admin_group_object_ids = [] # See description of this value in the AKS Readme
  location               = "us-west1"
}

See the AKS README for all available configuration options.

Creating a GKE Cluster

Call the GKE module by adding this to an existing Terraform file:

module "nvidia-gke" {
  source       = "git::github.com/NVIDIA/nvidia-terraform-modules/gke" 
  cluster_name =  "nvidia-gke-cluster"
  project_id   =  "your-gcp-project-id"
  region       =  "us-west1"     
  node_zones   =  ["us-west1-a"]
}

See the GKE README for all available configuration options.

Cloud Native Service Add On Pack (CNPack)

In each subdirectory, there is a Terraform module to provision the Kubernetes cluster and any additional prerequisite cloud infrastructure to launch CNPack. See CNPack on EKS, CNPack on GKE, and CNPack on AKS for more information and the sample CNPack configuration file.

More information on CNPack can be found on the NVIDIA AI Enterprise Documentation

State Management

These modules do not set up state management for the generated Terraform state file, deleting the statefile (terraform.tfstate) generated by Terraform could result in cloud resources needing to be manually deleted. We strongly encourage you configure remote state. Please see the Terraform Documentation for more information.

Contributing

Pull requests are welcome! Please see our contribution guidelines.

Getting help or Providing feedback

Please open an issue on the GitHub project for any questions. Your feedback is appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
aks		aks
eks		eks
gke		gke
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA Terraform Kubernetes Modules

Getting Started With Terraform

Support Matrix

Usage

Provision a GPU enabled Kubernetes Cluster

Creating an EKS Cluster

Creating an AKS Cluster

Creating a GKE Cluster

Cloud Native Service Add On Pack (CNPack)

State Management

Contributing

Getting help or Providing feedback

Useful Links

About

Releases

Packages

Languages

License

pritishnahar95/nvidia-terraform-modules

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Terraform Kubernetes Modules

Getting Started With Terraform

Support Matrix

Usage

Provision a GPU enabled Kubernetes Cluster

Creating an EKS Cluster

Creating an AKS Cluster

Creating a GKE Cluster

Cloud Native Service Add On Pack (CNPack)

State Management

Contributing

Getting help or Providing feedback

Useful Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages