Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Datacenter use case #783

Merged
merged 9 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added matrix decomposition scheme to improve graph partitioning
- DrivAerML dataset support in FIGConvNet example.
- Retraining recipe for DoMINO from a pretrained model checkpoint
- Added Datacenter CFD use case.

### Changed

Expand All @@ -23,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Moved non-dimensionaliztion out of domino datapipe to datapipe in domino example
- Updated utils in `modulus.launch.logging` to avoid unnecessary `wandb` and `mlflow` imports
- Moved to experiment-based Hydra config in Lagrangian-MGN example
- Make data caching optional in `MeshDatapipe`

### Deprecated

Expand Down
Binary file added docs/img/datacenter_design_cfd.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/datacenter_hybrid_training.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 84 additions & 0 deletions examples/cfd/datacenter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Thermal and airflow surrogate model for Datacenter design

This example demonstrates the use of a Deep Learning model (3D UNet) for training a
surrogate model for datacenter airflow to enable real-time datacenter design.
The aim of this workflow is to train a Deep Learning model that can predict the
temperature and airflow distribution within a hot aisle of a typical datacenter.
For any given geometry of the hot aisle (height, width, and length) and the number
of IT racks inside it, the trained model can predict the temperature, velocity,
and pressure distribution inside the hot aisle instantaneously. Such an approach
can be very useful from a datacenter design perspective where iterating through
various design combinations is crucial to obtain optimal cooling and minimize
hotspots.

![Design study using the AI surrogate model](../../../docs/img/datacenter_design_cfd.gif)

## Dataset

The model is trained on OpenFOAM simulation data. Based on the variables, i.e.,
Length, Height, Width, and Number of Racks, several hot aisle configurations are
generated. These configurations are solved with OpenFOAM assuming maximum flow
rate and rack exit temperature (max load condition). Steady state simulations
are used and the resulting OpenFOAM data is exported in VTK format for training
of the AI surrogate. The dataset is then normalized using the mean and standard
deviation statistics of the dataset. The normalized dataset, along with a sample
OpenFOAM configuration, can be downloaded using the NGC link: [Link to be added].

**Note:** Access to NVAIE is required to download the dataset
and the reference OpenFOAM configuration.

**Note:** The OpenFOAM configuration provided is only representative.
Several key aspects have been masked to protect the IP.
Users should not expect to generate the training data
exactly using this setup, and one will have to change
it using their own geometries and boundary conditions.

## Training

A UNet model is used in this problem. The hex-dominant mesh used in this problem
makes this model an attractive choice offering good speed and accuracy. Since
the model is primarily trained to capture the changes in geometry, we use the
Signed Distance Field of the interior of the hot aisle to capture the parameter
variation. Additionally, we add sinusoidal embeddings to enable the model to
capture sharp features in the flow field. Finally, to make the different
datacenter sizes uniform (for ingestion into UNet), we pad the geometry for the
maximum hot aisle dimensions. This padding is removed before computing the loss.

The model can be trained by executing the below commands:

```bash
python train.py
```

To train on multiple GPUs,

```bash
mpirun -np <#GPUs> python train.py
```

Once the model is trained, you can use the inference.py script to compute the
model inference. For generating the Signed Distance Field and geometry for the
inference, we make use of the utilities from Modulus-Sym.

### Training of Physics-Informed model

We also train a variant where we add the physics losses to the data loss.
The physics-informed training can be executed using the below commands:

```bash
python train_physics_informed.py
```

Addition of such physics data losses proves very beneficial in the low-data
regime where the physics losses can compensate for the lack of enough data.

![Comparison of data+physics driven training with pure data driven training](../../../docs/img/datacenter_hybrid_training.png)

## Contributors

This example was developed as a part of collaboration between NVIDIA and Wistron.

## Resources

1. [Wistron Uses NVIDIA Omniverse and NVIDIA Modulus to Build Digital Twin Platform, Transforming Factory Planning and Operations](https://www.wistron.com/en/Newsroom/2024-03-19-1)
2. [Model Innovators: How Digital Twins Are Making Industries More Efficient](https://blogs.nvidia.com/blog/digital-twins-modulus-wistron/)
33 changes: 33 additions & 0 deletions examples/cfd/datacenter/conf/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

hydra:
job:
chdir: True
run:
dir: ./outputs

start_epoch: 1
max_epochs: 260

start_lr: 1e-3
lr_scheduler_gamma: 0.99975

train_batch_size: 2
val_batch_size: 2

train_num_samples: 768
val_num_samples: 192
21 changes: 21 additions & 0 deletions examples/cfd/datacenter/conf/config_inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

hydra:
job:
chdir: True
run:
dir: ./outputs
35 changes: 35 additions & 0 deletions examples/cfd/datacenter/conf/config_physics_informed.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

hydra:
job:
chdir: True
run:
dir: ./outputs

start_epoch: 1
max_epochs: 260

start_lr: 1e-3
lr_scheduler_gamma: 0.99975

phy_wt: 0.001

train_batch_size: 2
val_batch_size: 2

train_num_samples: 64
val_num_samples: 192
Loading