GitHub - SpatialVision/Orient-Anything

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang^1* · Ziang Zhang^1* · Tianyu Pang² · Du Chao² · Hengshuang Zhao³ · Zhou Zhao¹

¹Zhejiang University ²SEA AI Lab ³HKU

*Equal Contribution

Orient Anything, a robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization ability for images in the wild.

News

2024-12-24: Paper, Project Page, Code, Models, and Demo are released.

Pre-trained models

We provide three models of varying scales for robust object orientation estimation in images:

Model	Params	Checkpoint
Orient-Anything-Small	23.3 M	Download
Orient-Anything-Base	87.8 M	Download
Orient-Anything-Large	305 M	Download

Usage

1 Prepraration

pip install -r requirements.txt

2 Use our models

2.1 In Gradio app

Start gradio by executing the following script:

python app.py

then open GUI page(default is https://127.0.0.1:7860) in web browser.

or, you can try it in our Huggingface-Space

2.2 In Python Scripts

from paths import *
from vision_tower import DINOv2_MLP
from transformers import AutoImageProcessor
import torch
from PIL import Image

import torch.nn.functional as F
from utils import *
from inference import *

from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="Viglong/Orient-Anything", filename="croplargeEX2/dino_weight.pt", repo_type="model", cache_dir='./', resume_download=True)
print(ckpt_path)

save_path = './'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
dino = DINOv2_MLP(
                    dino_mode   = 'large',
                    in_dim      = 1024,
                    out_dim     = 360+180+180+2,
                    evaluate    = True,
                    mask_dino   = False,
                    frozen_back = False
                )

dino.eval()
print('model create')
dino.load_state_dict(torch.load(ckpt_path, map_location='cpu'))
dino = dino.to(device)
print('weight loaded')
val_preprocess   = AutoImageProcessor.from_pretrained(DINO_LARGE, cache_dir='./')

image_path = '/path/to/image'
origin_image = Image.open(image_path).convert('RGB')
angles = get_3angle(origin_image, dino, val_preprocess, device)
azimuth     = float(angles[0])
polar       = float(angles[1])
rotation    = float(angles[2])
confidence  = float(angles[3])

Best Practice

To avoid ambiguity, our model only supports inputs that contain images of a single object. For daily images that usually contain multiple objects, it is a good choice to isolate each object with DINO-grounding and predict the orientation separately.

[ToDo]

Test-Time Augmentation

In order to further enhance the robustness of the model，We further propose the test-time ensemble strategy. The input images will be randomly cropped into different variants, and the predicted orientation of different variants will be voted as the final prediction result. We implement this strategy in functions get_3angle_infer_aug() and get_crop_images().

Citation

If you find this project useful, please consider citing:

@article{orient_anything,
  title={Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models},
  author={Wang, Zehan and Zhang, Ziang and Pang, Tianyu and Du, Chao and Zhao, Hengshuang and Zhao, Zhou},
  journal={arXiv:2412.18605},
  year={2024}
}

Acknowledgement

Thanks to the open source of the following projects: Grounded-Segment-Anything, render-py

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
render		render
.gitignore		.gitignore
README.md		README.md
app.py		app.py
inference.py		inference.py
paths.py		paths.py
requirements.txt		requirements.txt
utils.py		utils.py
vision_tower.py		vision_tower.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

News

Pre-trained models

Usage

1 Prepraration

2 Use our models

2.1 In Gradio app

2.2 In Python Scripts

Best Practice

Test-Time Augmentation

Citation

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

SpatialVision/Orient-Anything

Folders and files

Latest commit

History

Repository files navigation

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

News

Pre-trained models

Usage

1 Prepraration

2 Use our models

2.1 In Gradio app

2.2 In Python Scripts

Best Practice

Test-Time Augmentation

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages