【NeurIPS 2024 🇨🇦】ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images

We are the first to accomplish Open-Vocabulary 3D Object Detection tasks without using any 3D ground truth data.
Thank you for 🌟 our ImOV3D.

Timing Yang*, Yuanliang Ju*, Li Yi
Shanghai Qi Zhi Institute, IIIS Tsinghua University, Shanghai AI Lab

Overall Pipeline

Environment Setup

To set up the project environment, follow this step:

Create a virtual environment:

conda env create -f environment.yml

After creating the virtual environment, activate it with:

conda activate ImOV3D

PointNet++ Backbone Installation

cd pointnet2
python setup.py install
cd ..

Dataset Preparation

Pretrain Stage

For detailed guidance on setting up the dataset for the pretraining stage, see the dataset instructions.

Adaptation

See Data Preparation for SUNRGBD or ScanNet.

You can also download Data from Baidu.

Format

--[data_name]  # Root directory of the dataset
  ├── [data_name]_2d_bbox_train       # Training data with 2D bounding boxes
  ├── [data_name]_2d_bbox_val         # Validation data with 2D bounding boxes
  ├── [data_name]_pc_bbox_votes_train # Training data with point cloud bounding box votes
  ├── [data_name]_pc_bbox_votes_val   # Validation data with point cloud bounding box votes
  ├── [data_name]_trainval_train      # Training data (2D image + Calib)
  └── [data_name]_trainval_eval       # Evaluation data (2D image + Calib)

Pretrain Weight

Module	Description
PointCloudRender	Finetuned ControlNet

DataSet	Description	Logs
LVIS	Pretrain Stage	SUNRGBD,ScanNet
SUNRGBD	Adaptation Stage	SUNRGBD
ScanNet	Adaptation Stage	ScanNet

You can download then from Baidu.

Training and Evaluation

1️⃣ Pretrain

Pretrain ImOV3D on the LVIS dataset:

bash ./scripts/train_lvis.sh

2️⃣ Adapation

For the SUNRGBD dataset:

bash ./scripts/train_sunrgbd.sh

For the ScanNet dataset:

bash ./scripts/train_scannet.sh

3️⃣ Evaluation

To measure the effectiveness of model, proceed to the evaluation phase.

bash ./scripts/eval.sh

Contect

If you have any questions, please feel free to contact us:

Timing Yang: timingya@usc.edu Yuanliang Ju: yuanliang.ju@mail.utoronto.ca

Acknowledgement

Our code is based on ImVoteNet, OV-3DET, Detic, ControlNet, ZoeDepth, surface_normal_uncertainty.

Citation

@article{yang2024imov3d,
  title={ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images},
  author={Yang, Timing and Ju, Yuanliang and Yi, Li},
  journal={NeurIPS 2024},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

【NeurIPS 2024 🇨🇦】ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images

Overall Pipeline

Environment Setup

Dataset Preparation

Pretrain Stage

Adaptation

Format

Pretrain Weight

Training and Evaluation

Contect

Acknowledgement

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

【NeurIPS 2024 🇨🇦】ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images

Overall Pipeline

Environment Setup

Dataset Preparation

Pretrain Stage

Adaptation

Format

Pretrain Weight

Training and Evaluation

Contect

Acknowledgement

Citation