Estimated time it takes to fine-tune grounded SAM2 model? #60

GeorgiaA · 2024-10-25T15:44:17Z

GeorgiaA
Oct 25, 2024

Hello,

I am working with grounded SAM2 to detect objects in underwater images, but it needs fine-tuning to adapt to an underwater setting. My company is asking for an estimated cost of how much it would cost to fine-tune. I plan to use cloud resources at first to estimate future costs, so I need to get a rough idea of how many hours it would take to fine-tune grounded SAM2. I plan to use an A100 GPU with 80GB memory as suggested in the SAM2 fine-tuning/training documentation. I have around 2,500 images that I could use.

Has anyone fine-tuned grounded SAM2 before? If so how long did it take approximately, how much data were you using, and how many epochs did you fine-tune for?

Any information would be greatly appreciated!

Thanks,
Georgia

rentainhe · 2024-10-28T08:11:31Z

rentainhe
Oct 28, 2024
Maintainer

Thanks a lot for your attention to our work @GeorgiaA

I was wondering what's the bottleneck of Grounded SAM 2 in your underwater scenarios, because we combine grounding model with tracking model in this pipeline, if the zero-shot performance of grounding model is good in your scenes, you don't have to fine-tune this part. If you're facing tracking issue in SAM 2, you only have to fine-tune SAM 2 on your own image with tracking labels.

And you can replace the grounding model with other detectors (if you only need to detect a fixed set of classes) like YOLOv11 and fine-tune YOLO in your own scenarios, which is a more effective way.

If you want to use a fine-tuned Grounding Model, you can try some open-source solutions (because we did not release our training code) on fine-tune grounding models: MM-Grounding-DINO or YOLO-World.

I hope these solutions help for you

2 replies

GeorgiaA Oct 28, 2024
Author

Hi @rentainhe thank you for this. I hadn't realised this! It's an issue with both parts. Sometimes Grounding-DINO places the bounding box in the incorrect place, and sometimes SAM 2 does not segment the object within the bounding box correctly. I know that the SAM 2 model does need fine-tuning, as when using SAM 2 on it's own it was not adapting well to the underwater scenes. SAM 2 works well at segmenting underwater objects on the online demo that Meta realised, but that model is trained on many open-source datasets, including underwater datasets, unlike the model checkpoints they have shared on their GitHub repo which is only trained on their dataset. So this suggests to me that I need to fine-tune it with some underwater data.

Thank you for clarifying that I can swap out the object detection model -- I had not realised that.

Do you have any rough estimations of how long it took you to train/fine-tune each part of the model (if you did that) e.g. the SAM 2 model and the grounding-DINO model?

Thanks for the help.

rentainhe Oct 30, 2024
Maintainer

Dear @GeorgiaA , I think finetuning SAM2 for underwater scenarios won't take too much resources, and they have already released their training/finetuning code, which you can directly apply to your own dataset. You can directly load the pretrained checkpoint and fine-tune it on your own data. And I think fine-tune Grounding DINO won't take too much time either, because your dataset is not too large.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimated time it takes to fine-tune grounded SAM2 model? #60

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Estimated time it takes to fine-tune grounded SAM2 model? #60

GeorgiaA Oct 25, 2024

Replies: 1 comment · 2 replies

rentainhe Oct 28, 2024 Maintainer

GeorgiaA Oct 28, 2024 Author

rentainhe Oct 30, 2024 Maintainer

GeorgiaA
Oct 25, 2024

Replies: 1 comment 2 replies

rentainhe
Oct 28, 2024
Maintainer

GeorgiaA Oct 28, 2024
Author

rentainhe Oct 30, 2024
Maintainer