-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference speed #8
Comments
To increase the inference speed, it is necessary to reduce the number of patches that are created. To make this easier and adaptable in the class MakeCropsDetectThem, there is an argument show_crops=True. Regarding speed, you are correct; it is not exceptionally high because patch-based inference technologies are primarily developed not for real-time tasks but for projects involving high-resolution photo processing. Our key distinction from SAHI: support for instance segmentation tasks with two levels of detection quality (less accurate but without a burden on the operational memory, more resource-intensive but accurate), an improved algorithm for suppressing duplicate crop intersections (thanks to additional sorting by the sizes of detected objects), a user-friendly interface with extensive options for selecting optimal parameters, and support for the most current models (everything provided by ultralytics: YOLOv8, YOLOv8-seg, YOLOv9, YOLOv9-seg, YOLOv10, FastSAM, and RTDETR). |
Thanks for the great library @Koldim2001! I am the creator of SAHI and an advisor at Ultralytics. I wanted to share some updates: SAHI now supports instance segmentation with a memory-efficient implementation for Ultralytics models and includes non-maximum-merging (NMM) for eliminating duplicate detections. 👍🏻 |
@fcakyon It is very nice to receive a positive review from you. My colleague and I were inspired by your project when creating this library! Thanks for adding instance segmentation. We will be in touch! 👍🏻 |
@poriop Good afternoon! I have some great news – the new version of the library now includes the ability to process all patches (crops) in one batch, which significantly increases inference speed. To take advantage of this functionality, you need to update the library ->
and then, when initializing the To use the library for processing video streams, you first need to configure all parameters on one frame (adjust the patch size based on the original image size and required accuracy) and then apply them to all frames in the stream. To conveniently set everything up initially, I recommend using the Logically, the more patches there are, the slower the individual inference. For example: With 16 patches from one frame on an RTX 3080 Laptop, it is possible to achieve 8 fps for yolov8 detection and yolov8-seg instance segmentation. Important information: Additionally, the latest update to the library has added the capability to input any converted ultralytics detection and instance segmentation model into TensorRT, which further increases fps by another 1.5 times (more than 12 fps for detection task with 16 patches). When converting, you need to specify the size of the number of generated patches in the batch parameter:
|
Below is an example of how you can write code to implement video stream processing and save the final processed video with the results of patch-based instance segmentation inference (as in the example from the GIF of the previous comment): import cv2
from ultralytics import YOLO
from patched_yolo_infer import MakeCropsDetectThem, CombineDetections, visualize_results
# Load the YOLOv8 model
model = YOLO("yolov8m-seg.pt") #or yolov8m-seg.engine in case of TensorRT
# Open the video file
cap = cv2.VideoCapture("video.mp4")
# Check if the video file was successfully opened
if not cap.isOpened():
exit()
# Get the frames per second (fps) of the video
fps = cap.get(cv2.CAP_PROP_FPS)
# Get the width and height of the video frames
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Codec for MP4
out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))
while True:
# Read a frame from the video
ret, frame = cap.read()
# Break the loop if there are no more frames
if not ret:
break
# Detect elements in the frame using the YOLOv8 model
element_crops = MakeCropsDetectThem(
image=frame,
model=model,
segment=True,
shape_x=640,
shape_y=500,
overlap_x=35,
overlap_y=35,
conf=0.2,
iou=0.75,
imgsz=640,
resize_initial_size=True,
show_crops=False,
batch_inference=True,
classes_list=[0, 1, 2, 3, 4, 5, 6]
)
# Combine the detections from the different crops
result = CombineDetections(element_crops, nms_threshold=0.2, match_metric='IOS')
# Visualize the results on the frame
frame = visualize_results(
img=result.image,
confidences=result.filtered_confidences,
boxes=result.filtered_boxes,
polygons=result.filtered_polygons,
classes_ids=result.filtered_classes_id,
classes_names=result.filtered_classes_names,
segment=True,
thickness=3,
show_boxes=False,
fill_mask=True,
show_class=False,
alpha=1,
return_image_array=True
)
# Resize the frame for display
scale = 0.5
frame_resized = cv2.resize(frame, (-1, -1), fx=scale, fy=scale)
# Display the frame
cv2.imshow('video', frame_resized)
# Write the frame to the output video file
out.write(frame)
# Break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the video capture and writer objects
cap.release()
out.release()
# Close all OpenCV windows
cv2.destroyAllWindows() |
Hi. Great job, I just have question about increasing inference speed
If I get it right - this code processed one crop at the time using for, so no optimization of this part? Because now I don't see any difference with the sahi library
The text was updated successfully, but these errors were encountered: