-
Notifications
You must be signed in to change notification settings - Fork 225
RGB Pointcloud slows down the publishing rate a lot #56
Comments
Confirmed this is occurring on my ROS Melodic, Ubuntu 18.04 machine. I suspect the main image processing loop is running too slowly on the CPU, since the K4A Viewer does the RGB point cloud math on the GPU. I'll investigate ways I could do this faster, potentially using SIMD instructions. |
Confirmed, the image processing loop take between 40-70ms when the RGB point cloud is enabled. I'll look for ways to potentially speed it up. |
@skalldri
Today I saw this example in the sensor sdk repository: Tomorrow I am planning to use the CPU version of this algorithm in the node to check its performance compared to the current version. |
The fast point-cloud example does not produce RGB point clouds, just regular point clouds. It's just a way to produce point clouds quickly without using SSE3. My documentation in usage.md is incorrect: we use SSE3-accelerated point cloud math, not GPU-accelerated. See: Yes: K4A Viewer does this all on the GPU. Since K4A viewer is going to be drawing everything to the screen as the final destination (rather than writing it to a ROS message), it makes sense for it to do as much work as possible in a shader. They wrote a shader that accepts a depth image and color image as input, as well as the X-Y tables, and then naively draws the RGB point cloud. That's a very fast operation so it runs super quickly. Copying the point cloud and RGB frame to the GPU, doing some work, then copying it back off the GPU is going to take time. I'd like to take a stab at improving the speed of the CPU-based version before resorting to |
Thanks for the detailed answer :) I am looking forward to it because thats one of our main blocking points at the moment in our project. |
@skalldri EDIT: EDIT2: |
Thanks for digging into this issue RoseFlunder. I suspect the iterator used to fill up the PointCloud2 message is very expensive. There is almost certainly a faster way to copy the data from the SDK structures into the PointCloud2 message. Unfortunately, I'm on vacation until the 25th so I won't be able to make much progress on this until I get back. |
Going to start having a look at this. |
EDIT: Never mind, the original code has a simple mistake: not calling ORIGINAL: Well, Valgrind seems to think that using Combined, I'll see if I can build a version of the point cloud functions that doesn't use the iterator. |
It's good to see that this is using |
Hi, I am also affected by this bug, and I was wondering if there had been any progress on that side? |
Using fillColorPointCloud is too slow. There is an open bug microsoft#56 about it. Instead, this publish on the xyz values in the camera frame which is what we need. The nb of points is equal to h*w of the raw_image and can be easily recombined if needed later on.
Hello, glad to know that this issue is getting attention and already started to be investigated. Thanks @RoseFlunder @skalldri for sharing your findings! This is indeed a limitation of the current driver implementation that should be addressed to speed up the loop cycle. Both Three questions that I would like to ask regarding this issue:
Thank you in advance for the reply! |
Hi All, |
@skalldri I'm also having issues with the PointCloud2Iterator, although I'm not a ser of Kinect Azure. Do you still plan to optimize it? I would be happy to collaborate and find out better ways to use it, or come up with improvements. |
Hi @YoshuaNava , The code currently does this work linearly in C++
We are looking at Cuda for this codepath when available. |
@ooeygui Thank you for your prompt answer. I see. I have continued using the iterator. Until now it allowed me to speed up an old raw deserialization method by 3-5x, which is a good gain with considerable low effort. I'll share any insights found while I work on this. |
Hi, I am prototyping for an application requiring the depth topic on ROS and it seems to be very slow at 1Hz. Is there any recent updates on this issue? Thanks |
@ibrachahine Thank you for your interest and ask here. We have not invested in fixing this bug, but would accept a pull request. Have you tried workign with parameters on the node to optimize for your hardware and scenario? |
Describe the bug
When enabling the rgb_point_cloud the performance of the node degrades a lot. Without the colorized pointcloud the node can maintain a steady 30 hz publishing rate when using 30 FPS with the kinect.
With the rgb_point_cloud enabled the publishing rate decreases to like 10-12 Hz.
When visualizing the point cloud in RViz the slow publishing rate leds to a laggy view.
RViz itself maintains high enough FPS displaying the pointcloud but the data comes in too slow.
When using the k4aviewer there is no slowdown in colorized point cloud view.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Maintain the 30 FPS rate for publishing the rgb pointcloud on machines that can easily view the colorized point cloud at 30 FPS in the k4aviewer.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: