diff --git a/generation/maisi/README.md b/generation/maisi/README.md
index a5ab82185..f848d1804 100644
--- a/generation/maisi/README.md
+++ b/generation/maisi/README.md
@@ -74,14 +74,14 @@ We retrained several state-of-the-art diffusion model-based methods using our da
 | [512x512x512](./configs/config_infer_80g_512x512x512.json)   |4x128x128x128| [80,80,80], 8 patches                  | 2                           | 44G         | 569s    | 30s      |
 | [512x512x768](./configs/config_infer_24g_512x512x768.json)   |4x128x128x192| [80,80,112], 8 patches                 | 4                           | 55G         | 904s    | 48s      |
 
-**Table 3:** Inference Time Cost and GPU Memory Usage. `DM Time` refers to the time cost of diffusion model inference. `VAE Time` refers to the time cost of VAE decoder inference. The total inference time is the `DM Time` plus `VAE Time`. When `autoencoder_sliding_window_infer_size` is equal or larger than the latent feature size, sliding window will not be used,
-and the time and memory cost remain the same. The experiment was tested on A100 80G GPU.
+**Table 3:** Inference Time Cost and GPU Memory Usage. `DM Time` refers to the time required for diffusion model inference. `VAE Time` refers to the time required for VAE decoder inference. The total inference time is the sum of `DM Time` and `VAE Time`. The experiment was conducted on an A100 80G GPU.
 
+During inference, the peak GPU memory usage occurs during the autoencoder's decoding of latent features.  
+To reduce GPU memory usage, we can either increase `autoencoder_tp_num_splits` or reduce `autoencoder_sliding_window_infer_size`.  
+Increasing `autoencoder_tp_num_splits` has a smaller impact on the generated image quality, while reducing `autoencoder_sliding_window_infer_size` may introduce stitching artifacts and has a larger impact on the generated image quality.
+
+When `autoencoder_sliding_window_infer_size` is equal to or larger than the latent feature size, the sliding window will not be used, and the time and memory costs remain the same. 
 
-During inference, the peak GPU memory usage happens during the autoencoder decoding latent features.
-To reduce GPU memory usage, we can either increasing `autoencoder_tp_num_splits` or reduce `autoencoder_sliding_window_infer_size`.
-Increasing `autoencoder_tp_num_splits` has smaller impact on the generated image quality.
-Yet reducing `autoencoder_sliding_window_infer_size` may introduce stitching artifact and has larger impact on the generated image quality.
 
 ### Training GPU Memory Usage
 VAE is trained on patches and thus can be trained with 16G GPU if patch size is set to be small like [64,64,64].