Skip to content

Latest commit

 

History

History
124 lines (90 loc) · 5.29 KB

README.md

File metadata and controls

124 lines (90 loc) · 5.29 KB

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang1* · Ziang Zhang1* · Tianyu Pang2 · Du Chao2 · Hengshuang Zhao3 · Zhou Zhao1

1Zhejiang University    2SEA AI Lab    3HKU

*Equal Contribution

Paper PDF Project Page

Orient Anything, a robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization ability for images in the wild.

teaser

News

Pre-trained models

We provide three models of varying scales for robust object orientation estimation in images:

Model Params Checkpoint
Orient-Anything-Small 23.3 M Download
Orient-Anything-Base 87.8 M Download
Orient-Anything-Large 305 M Download

Usage

1 Prepraration

pip install -r requirements.txt

2 Use our models

2.1 In Gradio app

Start gradio by executing the following script:

python app.py

then open GUI page(default is https://127.0.0.1:7860) in web browser.

or, you can try it in our Huggingface-Space

2.2 In Python Scripts

from paths import *
from vision_tower import DINOv2_MLP
from transformers import AutoImageProcessor
import torch
from PIL import Image

import torch.nn.functional as F
from utils import *
from inference import *

from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="Viglong/Orient-Anything", filename="croplargeEX2/dino_weight.pt", repo_type="model", cache_dir='./', resume_download=True)
print(ckpt_path)

save_path = './'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
dino = DINOv2_MLP(
                    dino_mode   = 'large',
                    in_dim      = 1024,
                    out_dim     = 360+180+180+2,
                    evaluate    = True,
                    mask_dino   = False,
                    frozen_back = False
                )

dino.eval()
print('model create')
dino.load_state_dict(torch.load(ckpt_path, map_location='cpu'))
dino = dino.to(device)
print('weight loaded')
val_preprocess   = AutoImageProcessor.from_pretrained(DINO_LARGE, cache_dir='./')

image_path = '/path/to/image'
origin_image = Image.open(image_path).convert('RGB')
angles = get_3angle(origin_image, dino, val_preprocess, device)
azimuth     = float(angles[0])
polar       = float(angles[1])
rotation    = float(angles[2])
confidence  = float(angles[3])

Best Practice

To avoid ambiguity, our model only supports inputs that contain images of a single object. For daily images that usually contain multiple objects, it is a good choice to isolate each object with DINO-grounding and predict the orientation separately.

[ToDo]

Test-Time Augmentation

In order to further enhance the robustness of the model,We further propose the test-time ensemble strategy. The input images will be randomly cropped into different variants, and the predicted orientation of different variants will be voted as the final prediction result. We implement this strategy in functions get_3angle_infer_aug() and get_crop_images().

Citation

If you find this project useful, please consider citing:

@article{orient_anything,
  title={Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models},
  author={Wang, Zehan and Zhang, Ziang and Pang, Tianyu and Du, Chao and Zhao, Hengshuang and Zhao, Zhou},
  journal={arXiv:2412.18605},
  year={2024}
}

Acknowledgement

Thanks to the open source of the following projects: Grounded-Segment-Anything, render-py