Detection of Novel Objects without Fine-tuning in Assembly Scenarios by Class-Agnostic Object Detection and Object Re-Identification
This repository provides the source code for the paper titled "Detection of Novel Objects without Fine-tuning in Assembly Scenarios by Class-Agnostic Object Detection and Object Re-Identification." The focus of this work is on a few-shot-like approach that targets object instances rather than categories. The primary task involves detecting instances of a query object within a gallery of different images based on provided example image(s).
The repository is roughly divided into three main parts:
- Code for Training the Class-Agnostic Object Detection Model: This section corresponds to Section 4.1 of the paper.
- Code for Training the Object Re-Identification Models: This part is detailed in Section 4.2 of the paper.
- Unified Pipeline: This includes both detection and re-identification for a few-shot-like approach, as described in Section 4.3 of the paper.
Additionally, the repository contains modified versions of several other projects, which are integral to its functionality:
- The mmdetection framework is utilized for detection purposes (cite).
- The ReID-Survey project is adapted for ReID (cite).
- The SuperGlobal approach serves as a reference for comparison with our object ReID method (cite).
- The DE-ViT framework is used as a reference for the few-shot object detection approach that our unified pipeline is compared against (cite).
Datasets, benchmarks, annotations, and other necessary resources are provided for download. For further details see the section Download of Image Data, Labels, and Checkpoints below.
IMPORTANT: Users should be aware that many paths in the code will require manual adjustments for proper functionality. Some paths are placeholders (e.g., /path/to/config
or /home/user
) and need to be updated accordingly. Others are relative paths (e.g., paper/checkpoints
) that must be completed to ensure the code runs smoothly.
@article{eisenbach2024detection,
author = {Eisenbach, Markus and Franke, Henning and Franze, Erik and Köhler, Mona and Aganian, Dustin and Seichter, Daniel and Gross, Horst-Michael},
title = {Detection of Novel Objects without Fine-Tuning in Assembly Scenarios by Class-Agnostic Object Detection and Object Re-Identification},
journal = {Automation},
volume = {5},
year = {2024},
number = {3},
pages = {373--406},
url = {https://www.mdpi.com/2673-4052/5/3/23},
issn = {2673-4052},
doi = {10.3390/automation5030023}
}
In order to apply the code provided in this repository, additional data is provided for download.
All the needed data to train and evaluate the models in this repository can be downloaded here.
The zip file contains the following folders that are referenced below:
paper/attach-benchmark
- Object labels for images of the ATTACH dataset and selected images for benchmarkingpaper/ikea-benchmark
- Object labels for images of the IKEA assembly dataset and selected images for benchmarkingpaper/imgs_cropped
- Image crops of the ATTACH dataset for benchmarking the class-agnotic object detectorpaper/imgs_full
- Full-size images of the ATTACH dataset for benchmarking the unified pipelinepaper/queries
- Query images representing the few shots to introduce each categorypaper/reid_datasets
- List of image files from different datasets (CO3D, ATTACH, Redwood, Google Scanned Objects, KTH Handtool Dataset, OHO, Workinghands) used to compile different versions of the dataset for training the object re-identification model and for benchmarking, respectively. Please note that you need to download the images of the individual datsets in order to compile the object re-identification datset. If you use these datasets, please cite them.paper/stat_images
- Images of the ATTACH dataset used to model the background in order to extract object detection thresholds for each of the introduced novel categories
Additionally, the full-sized images of the ATTACH dataset for which objects have been annotated can be downloaded here. If you use these images, please cite the ATTACH dataset.
The checkpoints of the best trained models in this repository can be downloaded here.
The zip file contains a checkpoints
folder that should be moved in the paper
folder, if the code in this repository should be applied as described below, where this folder is referred to as paper/checkpoints
.
In the following, the class-agnostic object detection, also referred to as CAOD in this README, is explained. The class-agnostic detection code is located in the mmdetection
directory, which contains a modified version of the mmdetection repository. This section is specifically used for the part of the paper that addresses class-agnostic detection. The DINO model has been trained for this purpose.
To install the necessary components, please follow the README instructions provided in the mmdetection repository. The trained DINO model checkpoint and inference configuration can be found in paper/checkpoints
.
For training and evaluation, users should refer to the README in the mmdetection repository. The relevant scripts include mmdetection/tools/train.py
, */test.py
, and */eval_pkl.py
. The DINO configuration used for the paper is located at mmdetection/trainings_configs/dino-4scale_r50_8xb2-12e_coco.py
.
Inference procedures can also be found in the mmdetection README. The API for inference is located at mmdetection/mmdet/apis/inference.py
, which is utilized later in the unified pipeline.
The benchmark images are located in paper/imgs_cropped
and are part of the ATTACH dataset. These images are used for evaluation in the paper using the DINO model from paper/checkpoints
.
The Object ReID functionality is located in the object-reid
directory and is specifically designed for training object ReID models. This section extends the ReID-Survey repository, and its usage is analogous to that of the original repository. The behavior of the system is entirely determined by the loaded configuration files, which are documented in object-reid/Object-ReID/config/defaults.py
.
This section also contains a modified version of the SuperGlobal repository, which serves as a benchmark for comparison.
Two ReID models have been trained for the final pipeline: one for the initial region of interest (RoI) proposal step and another for the final ReID process.
To install the necessary components, please follow the instructions in the README file located in the object-reid
directory. The datasets can be found at paper/reid_datasets
, which includes dataset annotations for ReID purposes. However, users must independently download the original dataset images. The datasets are documented in object-reid/Object-ReID/data/datasets/object_reid_datasets.py
and */tool_datasets.py
.
Trained model checkpoints and configurations are available in paper/checkpoints/cp_*
, where the model cp_regular
is used solely for the RoI proposal, while cp_nl
is utilized for the actual ReID process.
For training and evaluation, refer to the README in the object-reid
directory. The main script for this process is located at object-reid/Object-ReID/tools/main.py
. Examples used for experiments can be found in object-reid/Object-ReID/configs
.
Inference instructions are also detailed in the README of the object-reid
directory. The inference process is controlled via the INFERENCE section in the configuration file, with an example available at object-reid/Object-ReID/configs/Inference/inference.yml
. This aspect is more of an afterthought primarily used for visualization purposes.
A benchmark comparison is conducted with the CBIR method known as SuperGlobal. This comparison utilizes the object-reid/SuperGlobal
directory, with the relevant script located at object-reid/Object-ReID/tools/test_cbir.py
. Configurations are also employed in this comparison, with examples available in object-reid/Object-ReID/configs/CBIR
.
The final pipeline for the few-shot detection-like approach utilizes trained models from both the detection and ReID sections. Specifically, it employs:
- The first ReID model for Region of Interest (RoI) proposals.
- The Class-Agnostic Object Detection (CAOD) model to identify all objects within the proposed RoIs.
- A second ReID model to recognize the query object.
This unified pipeline is located in the object-reid/Object-ReID/caod
directory, while the rest of the object-reid/Object-ReID
directory is not required for operation. To avoid confusion: The legacy term "comparison images" is sometimes used interchangeably with "queries".
To set up the environment, follow these steps:
- Install the conda environment using the
environment.yml
file. - Install mmdetection:
- Note that you should not use the mmdetection directory from this repository, as it has not been tested. Instead, an independent installation of version 3.0.0 is recommended.
- Trained model checkpoints and configuration files can be found in
paper/checkpoints
.
To utilize the pipeline, run the script located at object-reid/Object-ReID/caod/main.py
. The parameters for this script are documented within the file, and example parameters can be found in Section 4.3 of the paper. For information on annotation file formats, refer to the examples in paper/attach-benchmark
.
Required inputs for the pipeline include:
- Gallery image files
- Annotation file for gallery images
- Query image files
- Annotation file for query images
- Dataset background image files (stat images, used to determine threshold parameters)
- Annotation file for stat images
- Configuration files and checkpoints for the first ReID model, CAOD model, and second ReID model
In the paper two benchmark datasets were used: ATTACH and IKEA-ASM.
- ATTACH:
- Annotations used by the pipeline can be found in
paper/attach-benchmark/
. - Gallery images are located in
paper/imgs_full/table
. - Full query images are stored in
paper/queries
. - The script
paper/attach-benchmark/create_query_crops.py
can be used to create cropped query images from full images. - Background images for the dataset are in
paper/stat_images
, which are used to determine threshold parameters
- Annotations used by the pipeline can be found in
Evaluation statistics for ATTACH are calculated and saved automatically when running the benchmark with main.py
.
- IKEA-ASM:
- Annotations used by the pipeline can be found in
paper/ikea-benchmark/
. - Annotations refer to the downloaded IKEA-ASM dataset directory structure and function analogous to ATTACH
- The script
paper/ikea-benchmark/collect_ikea_queries.py
can be used to create cropped query images from full images. - The script
paper/calc_ikea_stats.py
can be used to calculate overall evaluation stats from multiple results
- Annotations used by the pipeline can be found in
IMPORTANT: The main.py
script contains some code in lines 146 - 152
that is specific to the benchmark dataset used and requires manual editing. It may be needed to replicate results from the paper.
A benchmark comparison with the few-shot object detection (FSOD) method titled "Detect Every Thing with Few Examples" is available in the devit
directory. To get started, follow the instructions in their README for installation and to download the necessary checkpoints. For the ATTACH benchmark, use the script located at devit/tools/eval_reid.py
, and for the IKEA benchmark, use the script at devit/tools/eval_reid_ikea.py
. The modified configuration files can be found in the devit/configs/few-shot
directory, and the FSOD settings, such as the number of shots, can be supplied via the command line.
This project is released under the MIT license. The code it is based on, the ReID-Survey project, is also released under the MIT license. Please note that the mmdetection framework is utilized for detection purposes, which is released under the Apache-2.0 license.