【NeurIPS 2024 🇨🇦】ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images
- We are the first to accomplish Open-Vocabulary 3D Object Detection tasks without using any 3D ground truth data.
- Thank you for 🌟 our ImOV3D.
Timing Yang*, Yuanliang Ju*, Li Yi
Shanghai Qi Zhi Institute, IIIS Tsinghua University, Shanghai AI Lab
To set up the project environment, follow this step:
Create a virtual environment:
conda env create -f environment.yml
After creating the virtual environment, activate it with:
conda activate ImOV3D
PointNet++ Backbone Installation
cd pointnet2
python setup.py install
cd ..
For detailed guidance on setting up the dataset for the pretraining stage, see the dataset instructions.
See Data Preparation for SUNRGBD or ScanNet.
You can also download Data from Baidu.
--[data_name] # Root directory of the dataset
├── [data_name]_2d_bbox_train # Training data with 2D bounding boxes
├── [data_name]_2d_bbox_val # Validation data with 2D bounding boxes
├── [data_name]_pc_bbox_votes_train # Training data with point cloud bounding box votes
├── [data_name]_pc_bbox_votes_val # Validation data with point cloud bounding box votes
├── [data_name]_trainval_train # Training data (2D image + Calib)
└── [data_name]_trainval_eval # Evaluation data (2D image + Calib)
Module | Description |
---|---|
PointCloudRender | Finetuned ControlNet |
DataSet | Description | Logs |
---|---|---|
LVIS | Pretrain Stage | SUNRGBD,ScanNet |
SUNRGBD | Adaptation Stage | SUNRGBD |
ScanNet | Adaptation Stage | ScanNet |
You can download then from Baidu.
1️⃣ Pretrain
Pretrain ImOV3D on the LVIS dataset:
bash ./scripts/train_lvis.sh
2️⃣ Adapation
For the SUNRGBD dataset:
bash ./scripts/train_sunrgbd.sh
For the ScanNet dataset:
bash ./scripts/train_scannet.sh
3️⃣ Evaluation
To measure the effectiveness of model, proceed to the evaluation phase.
bash ./scripts/eval.sh
If you have any questions, please feel free to contact us:
Timing Yang: [email protected] Yuanliang Ju: [email protected]
Our code is based on ImVoteNet, OV-3DET, Detic, ControlNet, ZoeDepth, surface_normal_uncertainty.
@article{yang2024imov3d,
title={ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images},
author={Yang, Timing and Ju, Yuanliang and Yi, Li},
journal={NeurIPS 2024},
year={2024}
}