diff --git a/generation/maisi/README.md b/generation/maisi/README.md index a5ab82185..f848d1804 100644 --- a/generation/maisi/README.md +++ b/generation/maisi/README.md @@ -74,14 +74,14 @@ We retrained several state-of-the-art diffusion model-based methods using our da | [512x512x512](./configs/config_infer_80g_512x512x512.json) |4x128x128x128| [80,80,80], 8 patches | 2 | 44G | 569s | 30s | | [512x512x768](./configs/config_infer_24g_512x512x768.json) |4x128x128x192| [80,80,112], 8 patches | 4 | 55G | 904s | 48s | -**Table 3:** Inference Time Cost and GPU Memory Usage. `DM Time` refers to the time cost of diffusion model inference. `VAE Time` refers to the time cost of VAE decoder inference. The total inference time is the `DM Time` plus `VAE Time`. When `autoencoder_sliding_window_infer_size` is equal or larger than the latent feature size, sliding window will not be used, -and the time and memory cost remain the same. The experiment was tested on A100 80G GPU. +**Table 3:** Inference Time Cost and GPU Memory Usage. `DM Time` refers to the time required for diffusion model inference. `VAE Time` refers to the time required for VAE decoder inference. The total inference time is the sum of `DM Time` and `VAE Time`. The experiment was conducted on an A100 80G GPU. +During inference, the peak GPU memory usage occurs during the autoencoder's decoding of latent features. +To reduce GPU memory usage, we can either increase `autoencoder_tp_num_splits` or reduce `autoencoder_sliding_window_infer_size`. +Increasing `autoencoder_tp_num_splits` has a smaller impact on the generated image quality, while reducing `autoencoder_sliding_window_infer_size` may introduce stitching artifacts and has a larger impact on the generated image quality. + +When `autoencoder_sliding_window_infer_size` is equal to or larger than the latent feature size, the sliding window will not be used, and the time and memory costs remain the same. -During inference, the peak GPU memory usage happens during the autoencoder decoding latent features. -To reduce GPU memory usage, we can either increasing `autoencoder_tp_num_splits` or reduce `autoencoder_sliding_window_infer_size`. -Increasing `autoencoder_tp_num_splits` has smaller impact on the generated image quality. -Yet reducing `autoencoder_sliding_window_infer_size` may introduce stitching artifact and has larger impact on the generated image quality. ### Training GPU Memory Usage VAE is trained on patches and thus can be trained with 16G GPU if patch size is set to be small like [64,64,64].