Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory safe subscription callback #108

Draft
wants to merge 7 commits into
base: ros2
Choose a base branch
from

Conversation

ewak
Copy link

@ewak ewak commented Jan 11, 2025

Manage the lifetime of the bondStatusCB member function by wrapping it in a lambda that captures a weak_ptr (via public inheritance of std::enable_shared_from_this)

NOTE:
This is a draft to show how it could be done.

It solves a use after free that is being experienced using intraprocess communication in nav2 ros-navigation/navigation2#4691 .

Also being discussed as a bug in rclcpp - ros2/rclcpp#2678 (comment)

Something like createSafeSubscriptionMemFuncCallback could make its way into the rclcpp namespace.

ewak added 2 commits January 11, 2025 10:13
Inherit from std::enable_shared_from_this
and wrap subscription callback in lambda
capturing a weak_ptr using weak_from_this()
to check that Bond object is still
in scope before processing the bondStatusCB
member function.
Create and use helper function to check that
class is still valid before calling member function.
@SteveMacenski
Copy link
Member

Note that beyond the linting problems that the job points out, there's an actual set of test failures in test_callbacks_cpp which probably point to a bug in the implementation / a test that needs a minor update:

02:07:59 1: Test command: /usr/bin/python3 "-u" "/opt/ros/rolling/share/ament_cmake_test/cmake/run_test.py" "/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml" "--package-name" "test_bond" "--output-file" "/tmp/ws/build_isolated/test_bond/ament_cmake_gtest/test_callbacks_cpp.txt" "--command" "/tmp/ws/build_isolated/test_bond/test_callbacks_cpp" "--gtest_output=xml:/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml"
02:07:59 1: Working Directory: /tmp/ws/build_isolated/test_bond
02:07:59 1: Test timeout computed to be: 60
02:07:59 1: -- run_test.py: invoking following command in '/tmp/ws/build_isolated/test_bond':
02:07:59 1:  - /tmp/ws/build_isolated/test_bond/test_callbacks_cpp --gtest_output=xml:/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml
02:07:59 1: Running main() from /opt/ros/rolling/src/gtest_vendor/src/gtest_main.cc
02:07:59 1: [==========] Running 2 tests from 1 test suite.
02:07:59 1: [----------] Global test environment set-up.
02:07:59 1: [----------] 2 tests from TestCallbacksCpp
02:07:59 1: [ RUN      ] TestCallbacksCpp.dieInLifeCallback
02:07:59 1: unknown file: Failure
02:07:59 1: C++ exception with description "bad_weak_ptr" thrown in the test body.
02:07:59 1: 
02:07:59 1: [  FAILED  ] TestCallbacksCpp.dieInLifeCallback (14 ms)
02:07:59 1: [ RUN      ] TestCallbacksCpp.remoteNeverConnects
02:07:59 1: unknown file: Failure
02:07:59 1: C++ exception with description "bad_weak_ptr" thrown in the test body.
02:07:59 1: 
02:07:59 1: [  FAILED  ] TestCallbacksCpp.remoteNeverConnects (5 ms)
02:07:59 1: [----------] 2 tests from TestCallbacksCpp (20 ms total)
02:07:59 1: 
02:07:59 1: [----------] Global test environment tear-down
02:07:59 1: [==========] 2 tests from 1 test suite ran. (27 ms total)
02:07:59 1: [  PASSED  ] 0 tests.
02:07:59 1: [  FAILED  ] 2 tests, listed below:
02:07:59 1: [  FAILED  ] TestCallbacksCpp.dieInLifeCallback
02:07:59 1: [  FAILED  ] TestCallbacksCpp.remoteNeverConnects
02:07:59 1: 
02:07:59 1:  2 FAILED TESTS
02:07:59 1: -- run_test.py: return code 1
02:07:59 1: -- run_test.py: inject classname prefix into gtest result file '/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml'
02:07:59 1: -- run_test.py: verify result file '/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml'
02:07:59 1/7 Test #1: test_callbacks_cpp ...............***Failed    0.21 sec

ewak added 2 commits January 14, 2025 13:57
This is due to the use of createSafeSubscriptionMemFuncCallback
when creating the subscription registering the bondStatusCB member
function
@ewak
Copy link
Author

ewak commented Jan 14, 2025

@SteveMacenski fixed the lint and test. This change requires the Bond to be created as a shared_ptr.

msg->heartbeat_timeout = static_cast<float>(heartbeat_timeout_.seconds());
msg->heartbeat_period = static_cast<float>(heartbeat_period_.seconds());
if (pub_->get_subscription_count() < 1) {return;}
pub_->publish(std::move(msg));
Copy link
Author

@ewak ewak Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveMacenski This is another use_intraprocess_comms fix. i.e the Bond Status publisher for use_intraprocess_comms should use a unique msg and check for subscribers.

This allows nav2_system_tests to pass.

NOTE: without this change the following test was failure was very reproducible for me.

colcon test --packages-select nav2_system_tests --event-handlers=console_direct+ --ctest-args -R test_speed_filter_global

ref: ros-navigation/navigation2#4691

Copy link
Member

@SteveMacenski SteveMacenski Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make it pass because of the if (pub_->get_subscription_count() < 1) {return;} return condition or the unique_ptr publisher? I'm curious if this is pattern I should be on the look-out for on MsgT publishers for depedencies, or if this is a special case where we can get around it by skipping publication.

Though this fixes the issue in Bond, if its the latter, I think that should be reported in the rclcpp thread that publishing IPC message instances causes part of the issue. That means something in the publish() for MsgT signatures is broken whereas MsgT::UniquePtr are not (which hopefully would be an easy apples-to-apples fix).

@mjcarroll
Copy link
Member

Just a note that these changes are ABI-breaking (adding members and changing inheritance), so definitely need to land in rolling.

@SteveMacenski
Copy link
Member

SteveMacenski commented Jan 28, 2025

I know that the aim is really to fix rclcpp, not this - so I think this PR is mostly used for an experimental reproduction ground and testing possible application-space solutions for the rclcpp IPC issue.

@ewak : Is this intended to be a candidate for merging for either a short term solution to get Nav2 turning over with IPC and/or should be reviewed by maintainers for inclusion?

Maintainers: This requires creating the object as a shared pointer, similar to rclcpp nodes themselves. Would this be acceptable to maintainers to put that restriction for getting IPC running on nodes that use bond?

I'd like to bridge the gap a bit to figure out next steps here, if there are some outside of rclcpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants