Skip to content

Official Implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

License

Notifications You must be signed in to change notification settings

EnVision-Research/Lotus

Repository files navigation

lotus Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Page Paper HuggingFace Demo HuggingFace Demo ComfyUI Replicate

Jing He1, Haodong Li1, Wei Yin2, Yixun Liang1, Leheng Li1, Kaiqiang Zhou3, Hongbo Zhang3, Bingbing Liu3,
Ying-Cong Chen1,4✉

1HKUST(GZ) 2University of Adelaide 3Noah's Ark Lab 4HKUST
Both authors contributed equally. Corresponding author.

teaser teaser

We present Lotus, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.

📢 News

  • 2025-01-17: Please check out our latest models (lotus-normal-g-v1-1, lotus-normal-d-v1-1), which were trained with aligned surface normals, leading to improved performance!
  • 2024-11-13: The demo now supports video depth estimation!
  • 2024-11-13: The Lotus disparity models (Generative & Discriminative) are now available, which achieve better performance!
  • 2024-10-06: The demos are now available (Depth & Normal). Please have a try!
  • 2024-10-05: The inference code is now available!
  • 2024-09-26: Paper released. Click here if you are curious about the 3D point clouds of the teaser's depth maps!

🛠️ Setup

This installation was tested on: Ubuntu 20.04 LTS, Python 3.10, CUDA 12.3, NVIDIA A800-SXM4-80GB.

  1. Clone the repository (requires git):
git clone https://github.com/EnVision-Research/Lotus.git
cd Lotus
  1. Install dependencies (requires conda):
conda create -n lotus python=3.10 -y
conda activate lotus
pip install -r requirements.txt 

🤗 Gradio Demo

  1. Online demo: Depth & Normal
  2. Local demo
  • For depth estimation, run:
    python app.py depth
    
  • For normal estimation, run:
    python app.py normal
    

🕹️ Usage

Testing on your images

  1. Place your images in a directory, for example, under assets/in-the-wild_example (where we have prepared several examples).
  2. Run the inference command: bash infer.sh.

Evaluation on benchmark datasets

  1. Prepare benchmark datasets:
  • For depth estimation, you can download the evaluation datasets (depth) by the following commands (referred to Marigold):
    cd datasets/eval/depth/
    
    wget -r -np -nH --cut-dirs=4 -R "index.html*" -P . https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/
    
  • For normal estimation, you can download the evaluation datasets (normal) (dsine_eval.zip) into the path datasets/eval/normal/ and unzip it (referred to DSINE).
  1. Run the evaluation command: bash eval_scripts/eval-[task]-[mode].sh, where [task] represents the task name (depth or normal) and [mode] refers to the mode name (d or g).
    (Optional) To reproduce the results presented in our paper, you can set the --rng_state_path option in the evaluation command. The RNG state files are available at ./rng_states/.

Choose your model

Below are the released models and their corresponding configurations:

CHECKPOINT_DIR TASK_NAME MODE
jingheya/lotus-depth-g-v1-0 depth generation
jingheya/lotus-depth-d-v1-0 depth regression
jingheya/lotus-depth-g-v2-1-disparity depth (disparity) generation
jingheya/lotus-depth-d-v2-0-disparity depth (disparity) regression
jingheya/lotus-normal-g-v1-1 normal generation
jingheya/lotus-normal-d-v1-1 normal regression

🎓 Citation

If you find our work useful in your research, please consider citing our paper:

@article{he2024lotus,
    title={Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction},
    author={He, Jing and Li, Haodong and Yin, Wei and Liang, Yixun and Li, Leheng and Zhou, Kaiqiang and Liu, Hongbo and Liu, Bingbing and Chen, Ying-Cong},
    journal={arXiv preprint arXiv:2409.18124},
    year={2024}
}

About

Official Implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published