Reproduction for GroundingDINO #17

e0jun · 2024-12-10T08:11:18Z

Hi authors.
Thanks for your interesting works.

I've tried to reproduce the reported results in the leaderboard using your code in (https://github.com/shikras/d-cube/blob/main/eval_sota/groundingdino.py), but I was unsuccessful.

Could you provide a tip for the reproduction?
Thank you in advance!

My result is as below:
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.08s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=65.70s).
Accumulating evaluation results...
DONE (t=11.63s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.073
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.082
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.076
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.052
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.090
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.184
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.184
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.184
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.013
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.073
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.228
loading annotations into memory...
Done (t=0.07s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.05s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=46.08s).
Accumulating evaluation results...
DONE (t=8.58s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.066
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.074
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.069
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.049
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.079
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.180
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.180
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.180
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.067
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.221
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.05s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=14.40s).
Accumulating evaluation results...
DONE (t=2.96s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.095
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.107
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.098
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.060
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.122
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.194
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.194
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.194
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.018
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.090
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.248

Charles-Xie · 2025-01-21T09:42:02Z

Hi,
sorry for the late reply.
According to what I can remember, the evaluation results on grounding-dino can be largely affected by the hyper-parameters during inference, like some post-processing tricks.
Please refer to these options in the script:

    parser.add_argument("--box_threshold", type=float, default=0.3, help="box threshold")
    parser.add_argument("--text_threshold", type=float, default=0.25, help="text threshold")

    parser.add_argument("--img-top1", action="store_true", help="select only the box with top max score")

In our paper, we reported the best result of grounding-dino that we can obtain with these tricks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduction for GroundingDINO #17

Reproduction for GroundingDINO #17

e0jun commented Dec 10, 2024

Charles-Xie commented Jan 21, 2025

Reproduction for GroundingDINO #17

Reproduction for GroundingDINO #17

Comments

e0jun commented Dec 10, 2024

Charles-Xie commented Jan 21, 2025