Is Large Language Model All You Need for 3D Scene Understanding

*Ruoyu Wang, *Weihan Xu, *Xihang Yu

*Equal Contribution

Robotic applications rely on scene understanding to analyze objects within a 3D environment. One crucial component of scene understanding is semantics labeling, which involves assigning class labels to semantic regions based on the objects within them. In a recent study Leveraging Large Language Model for 3D Scene Understanding, Large Language Models (LLMs) were found to be effective in incorporating common sense knowledge during the labeling process. In this project, we aim to compare two LLMs, GPT-J and RoBERTa, using fine-tuned feed-forward and contrastive networks, which were not evaluated in Leveraging Large Language Model for 3D Scene Understanding, for the semantic labeling task. The contributions of this project are twofold: (i) The proposed GPT-J with fine-tuned feed-forward network achieves state-of-the-art(SOTA) performance, and (ii) by varying the number of candidate objects, adopting ChatGPT-based room detection and fine-tuning a whole BERT-based network, we explore the possible performance bottleneck of our proposed GPT-J pretrained network.

Requirements

Before starting, you will need:

A CUDA-enabled GPU (we used an RTX 3080 with 16 GB of memory)
A corresponding version of CUDA (we used v11.1)
Python 3.8 with venv
Pip package manager

After cloning this repo:

Create a virtual environment python3 -m venv /path/to/llm_su_venv
Source the environment source /path/to/llm_su_venv/bin/activate
Enter this repo and install all requirements: pip install -r requirements.txt
- Note that some libraries listed in that file are no longer necessary, as it was procedurally generated. One can alternatively go through the scripts one wishes to run and install their individual dependencies.
- Such dependencies include: numpy, scipy, torch, torch_geometric, torchvision, matplotlib, transformers, tqdm, pandas, gensim, sympy.

Running Code

python zero_shot_<rooms/bldgs>.py runs our zero-shot language approach on the entire Matterport3D dataset to predict either rooms given objects or buildings given rooms.
python <ff/contrastive>_train(_gptj).py runs our feed-forward or contrastive training approaches respectively.
- Run python data_generator(_gptj).py prior to the above to generate the bootstrapped data needed for training and evaluation.
python bldg_ff_train(_gptj).py and python bldg_data_generator_comparison(_gptj).py are the equivalents for building-prediction. Note that said data generator does not bootstrap datapoints for the test set, instead just using the same test set as zero_shot_bldgs.py for easier comparison.
python create_label_embedding_gptj.py to create room label strings embeddings for contrastive network.
python <ff/contrastive>_holdout_tests.py runs training on a dataset with certain objects withheld, then evaluating on datapoints with those previously-unseen objects.
python <ff/contrastive>_label_space_test.py runs training on the mpcat40 label space dataset, then evaluates on the larger nyuClass label space dataset.
Some other utility functions and scripts are included as well, such as compute_cooccurrencies.py, which generates co-occurrency matrices (i.e. counting frequencies of room-object pairs)

Notice

BERT-finetuning code is located in branch BERT-Classifier, which has a separate README file on how to generate data and fine-tune the whole BERT-based classifier. Environment requirements for BERT-Classifier is the same as above.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
__pycache__		__pycache__
cooccurrency_matrices		cooccurrency_matrices
images		images
label_embeddings		label_embeddings
mp_data		mp_data
real_dsg_vis		real_dsg_vis
.DS_Store		.DS_Store
README.md		README.md
bldg_data_generator_comparison.py		bldg_data_generator_comparison.py
bldg_data_generator_comparison_gptj.py		bldg_data_generator_comparison_gptj.py
bldg_ff_train.py		bldg_ff_train.py
bldg_ff_train_gptj.py		bldg_ff_train_gptj.py
compute_cooccurrencies.py		compute_cooccurrencies.py
contrastive_holdout_tests.py		contrastive_holdout_tests.py
contrastive_label_space_test.py		contrastive_label_space_test.py
contrastive_train.py		contrastive_train.py
contrastive_train_gptj.py		contrastive_train_gptj.py
create_label_embedding_gptj.py		create_label_embedding_gptj.py
data_generator.py		data_generator.py
data_generator_gptj.py		data_generator_gptj.py
dataset.py		dataset.py
extract_labels.py		extract_labels.py
ff_holdout_tests.py		ff_holdout_tests.py
ff_label_space_test.py		ff_label_space_test.py
ff_train.py		ff_train.py
ff_train_gptj.py		ff_train_gptj.py
labels.py		labels.py
load_matterport3d_dataset.py		load_matterport3d_dataset.py
make_bldg_room_co.py		make_bldg_room_co.py
model_utils.py		model_utils.py
models.py		models.py
perplexity_measure.py		perplexity_measure.py
real_dsg_expt_zero_shot.py		real_dsg_expt_zero_shot.py
requirements.txt		requirements.txt
sample.py		sample.py
scene_graph.py		scene_graph.py
statistical_baseline.py		statistical_baseline.py
statistical_baseline_bldgs.py		statistical_baseline_bldgs.py
zero_shot_bldgs.py		zero_shot_bldgs.py
zero_shot_rooms.py		zero_shot_rooms.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Is Large Language Model All You Need for 3D Scene Understanding

Table of Contents

Overview

Requirements

Running Code

Notice

About

Releases

Packages

Languages

XihangYU630/llm_scene_understanding_gptj

Folders and files

Latest commit

History

Repository files navigation

Is Large Language Model All You Need for 3D Scene Understanding

Table of Contents

Overview

Requirements

Running Code

Notice

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages