Skip to content

KovenYu/WonderWorld

Repository files navigation

Interactive 3D Scene Generation from a Single Image

a arXiv twitter

forbidden.mp4

Getting Started

Installation

For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 48GB GPU memory to run.

Clone the repo and create the environment:

git clone https://github.com/KovenYu/WonderWorld.git && cd WonderWorld
mamba create --name wonderworld python=3.10
mamba activate wonderworld

We are using Pytorch3D to perform rendering. Run the following commands to install it or follow their installation guide (it may take some time). We tested on cuda=12.4, other cuda versions should also work.

# switch to cuda 12.4, other versions should also work
mamba install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
mamba install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install submodules/depth-diff-gaussian-rasterization-min/
pip install submodules/simple-knn/

Install the rest of the requirements:

pip install -r requirements.txt
cd ./RepViT/sam && pip install -e . && cd ../..
python -m spacy download en_core_web_sm

Export your OpenAI api_key (If you want to use GPT-4 to generate scene descriptions):

export OPENAI_API_KEY='your_api_key_here'

Download RepViT model and put it to the root directory.

wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt

Run examples

  • Example config file

    To run an example, first you need to write a config. An example config ./config/example.yaml is shown below (more examples are located at config/more_examples, feel free to try):

    runs_dir: output/real_campus_2
    example_name: real_campus_2
    
    seed: 1
    # enable guided depth diffusion
    depth_conditioning: True
    
    # use gpt to generate scene description
    use_gpt: False
    debug: True
    
    # depth model and camera/depth parameters
    depth_model: marigold
    camera_speed: 0.001
    fg_depth_range: 0.015
    depth_shift: 0.001
    sky_hard_depth: 0.02
    init_focal_length: 960
    
    # re-generate sky panorama images
    gen_sky_image: False
    # generate sky point cloud
    gen_sky: False
    
    # enable layer-wise generation
    gen_layer: True
    # load previously generated gaussians
    load_gen: False
  • Run

    Local Visualization Setup:

    On your local laptop, git clone https://github.com/haoyi-duan/splat.git and open index_stream.html.

    To enable interactive visualization of your results through this local web browser, follow these steps:

    • Ensure you have 'ssh' installed on your local machine.
    • The main program will run on server user_id@server_name
    # On your local machine
    ssh -L 7777:localhost:7777 server_name
    Main Program Running:

    On the server, run the main program:

    # On user_id@server_name
    python run.py --example_config config/example.yaml --port 7777

    More examples are located at config/more_examples, feel free to try!

    Interactive Generation Step:

    Open the index_stream.html on your local machine, and you should see the scene in it. You can navigate with WSAD and arrow keys.

    1. If you specify use_gpt=True in your example configuration file, the scene description for this new scene will be automatically generated by LLM; if you specify use_gpu=False, you can manually input scene description you want in the text box of the local browser. Remember to click 'Next scene is ...' after you are done.
    2. Next you need to set a proper camera view for the program to generate new scene. You can do this by wondering through the browser to a novel view, then press key 'R' to let program interactively generate new scene in this view for you.
    3. If you are not satisfied with the current generation, you can press key Z to delete the previous one generation, and follow step 1 and 2 to do a new generation.
    4. Repeat 1-3, you will interactively generate a large-scale connected scene, and you can wonder through the scene freely during the whole process.
    5. After some generation, you can press key X to save the current scene. Next time, you can load the generated scene by specifying load_gen=True in your configuration file.

How to add more examples?

We highly encourage you to add new images and try new stuff! You would need to do the image-caption pairing separately (e.g., using DALL-E to generate image and GPT4V to generate description).

  • Add a new image in ./examples/images/.

  • Add content of this new image in ./examples/examples.yaml.

    Here is an example:

    - name: new_example
      image_filepath: examples/images/new_example.png
      style_prompt: DSLR 35mm landscape
      content_prompt: scene name, object 1, object 2, object 3
      negative_prompt: ''
      background: ''
    • content_prompt: "scene name", "object 1", "object 2", "object 3"

    • negative_prompt and background are optional

  • Write a config config/new_example.yaml like ./config/example.yaml for the new example.

  • Run the program following the previous section. (For the first time use, the model will automatically generate the panorama sky images for the example, which takes about 20 minutes on A6000 GPU. After the corresponding sky images for the example are stored, later use of this example will automatically skip this step)

Citation

@article{yu2024wonderworld,
    title={WonderWorld: Interactive 3D Scene Generation from a Single Image},
    author={Hong-Xing Yu and Haoyi Duan and Charles Herrmann and William T. Freeman and Jiajun Wu},
    journal={arXiv:2406.09394},
    year={2024}
}

Related Project

Acknowledgement

We appreciate the authors of Marigold, SyncDiffusion, RepViT, Stable Diffusion, and OneFormer to share their code.