Skip to content

Commit

Permalink
grammar
Browse files Browse the repository at this point in the history
Signed-off-by: Can-Zhao <[email protected]>
  • Loading branch information
Can-Zhao committed Nov 21, 2024
1 parent 62e688c commit f43d3d5
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions generation/maisi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,14 +74,14 @@ We retrained several state-of-the-art diffusion model-based methods using our da
| [512x512x512](./configs/config_infer_80g_512x512x512.json) |4x128x128x128| [80,80,80], 8 patches | 2 | 44G | 569s | 30s |
| [512x512x768](./configs/config_infer_24g_512x512x768.json) |4x128x128x192| [80,80,112], 8 patches | 4 | 55G | 904s | 48s |

**Table 3:** Inference Time Cost and GPU Memory Usage. `DM Time` refers to the time cost of diffusion model inference. `VAE Time` refers to the time cost of VAE decoder inference. The total inference time is the `DM Time` plus `VAE Time`. When `autoencoder_sliding_window_infer_size` is equal or larger than the latent feature size, sliding window will not be used,
and the time and memory cost remain the same. The experiment was tested on A100 80G GPU.
**Table 3:** Inference Time Cost and GPU Memory Usage. `DM Time` refers to the time required for diffusion model inference. `VAE Time` refers to the time required for VAE decoder inference. The total inference time is the sum of `DM Time` and `VAE Time`. The experiment was conducted on an A100 80G GPU.

During inference, the peak GPU memory usage occurs during the autoencoder's decoding of latent features.
To reduce GPU memory usage, we can either increase `autoencoder_tp_num_splits` or reduce `autoencoder_sliding_window_infer_size`.
Increasing `autoencoder_tp_num_splits` has a smaller impact on the generated image quality, while reducing `autoencoder_sliding_window_infer_size` may introduce stitching artifacts and has a larger impact on the generated image quality.

When `autoencoder_sliding_window_infer_size` is equal to or larger than the latent feature size, the sliding window will not be used, and the time and memory costs remain the same.

During inference, the peak GPU memory usage happens during the autoencoder decoding latent features.
To reduce GPU memory usage, we can either increasing `autoencoder_tp_num_splits` or reduce `autoencoder_sliding_window_infer_size`.
Increasing `autoencoder_tp_num_splits` has smaller impact on the generated image quality.
Yet reducing `autoencoder_sliding_window_infer_size` may introduce stitching artifact and has larger impact on the generated image quality.

### Training GPU Memory Usage
VAE is trained on patches and thus can be trained with 16G GPU if patch size is set to be small like [64,64,64].
Expand Down

0 comments on commit f43d3d5

Please sign in to comment.