-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: countgd sam2 video support #318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! just some minor comments
OWLV2 = "owlv2" | ||
|
||
|
||
def od_sam2_video_tracking( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
vision_agent/tools/tools.py
Outdated
"""'countgd_sam2_video_tracking' is a tool that can segment multiple objects given a text | ||
prompt such as category names or referring expressions. The categories in the text | ||
prompt are separated by commas. It returns a list of bounding boxes, label names, | ||
mask file names and associated probability scores of 1.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minro comment, only florence2 returns probability scores of 1.0, countgd and owlv2 will can return regular probability scores. So you can just say "and associated probability scores."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
vision_agent/tools/tools.py
Outdated
"""'owlv2_sam2_video_tracking' is a tool that can segment multiple objects given a text | ||
prompt such as category names or referring expressions. The categories in the text | ||
prompt are separated by commas. It returns a list of bounding boxes, label names, | ||
mask file names and associated probability scores of 1.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above on prob scores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
vision_agent/tools/tools.py
Outdated
List[Dict[str, Any]]: A list of dictionaries containing the score, label, | ||
bounding box, and mask of the detected objects with normalized coordinates | ||
(xmin, ymin, xmax, ymax). xmin and ymin are the coordinates of the top-left | ||
and xmax and ymax are the coordinates of the bottom-right of the bounding box. | ||
The mask is binary 2D numpy array where 1 indicates the object and 0 indicates | ||
the background. | ||
|
||
Example | ||
------- | ||
>>> countgd_sam2_video_tracking("car, dinosaur", image) | ||
[ | ||
{ | ||
'score': 1.0, | ||
'label': 'dinosaur', | ||
'bbox': [0.1, 0.11, 0.35, 0.4], | ||
'mask': array([[0, 0, 0, ..., 0, 0, 0], | ||
[0, 0, 0, ..., 0, 0, 0], | ||
..., | ||
[0, 0, 0, ..., 0, 0, 0], | ||
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8), | ||
}, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the return values and examples from florence2_sam2_video_tracking
. It's actually a list of list of dictionaries where the inner list is a frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
vision_agent/tools/tools.py
Outdated
Returns: | ||
List[Dict[str, Any]]: A list of dictionaries containing the score, label, | ||
bounding box, and mask of the detected objects with normalized coordinates | ||
(xmin, ymin, xmax, ymax). xmin and ymin are the coordinates of the top-left | ||
and xmax and ymax are the coordinates of the bottom-right of the bounding box. | ||
The mask is binary 2D numpy array where 1 indicates the object and 0 indicates | ||
the background. | ||
|
||
Example | ||
------- | ||
>>> countgd_sam2_video_tracking("car, dinosaur", image) | ||
[ | ||
{ | ||
'score': 1.0, | ||
'label': 'dinosaur', | ||
'bbox': [0.1, 0.11, 0.35, 0.4], | ||
'mask': array([[0, 0, 0, ..., 0, 0, 0], | ||
[0, 0, 0, ..., 0, 0, 0], | ||
..., | ||
[0, 0, 0, ..., 0, 0, 0], | ||
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8), | ||
}, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above on return comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Added
countgd_sam2_video_tracking
tool, to usecountgd
for object detection and pass the results tosam2
to track the objects in the entire video.Link to Colab
Note to the reviewer:
mypy
complains when using too generic parameters.