Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: avoid race condition in action client #1125

Closed

Conversation

aditya2592
Copy link

@aditya2592 aditya2592 commented May 4, 2023

work in progress PR to address #1123

@aditya2592 aditya2592 changed the title rclpy avoid rance condition avoid race condition in action client May 4, 2023
@aditya2592 aditya2592 changed the base branch from rolling to humble May 4, 2023 02:16
@clalancette
Copy link
Contributor

Thank you for the contribution.

First, can you please retarget this to rolling? We'll review it there first, then we can consider backporting it to humble.

Second, can you please rebase your commits and add in a Signed-off-by line? The DCO bot has to pass for us to accept the commits.

Finally, can you please add some context to the initial description saying exactly what this is fixing? That would help us review it.

@apockill
Copy link
Contributor

apockill commented May 4, 2023

HI @clalancette - I'm just following up to say this PR is related to the bug described in #1123, which I posted yesterday.

To summarize my bug report here: There are race conditions in the "async" functions on the ActionClient for sending goals, cancelling goals, and getting results. The core of the problem is that the underling _client_handle can sometimes finish its request before the sequence_number has been tracked in the _goal_sequence_number_to_goal_id (or any number of other dicts that are initialized for the purpose of tracking ongoing requests of different types).

The symptom ends up being that you have a request that never completes (blocking forever), and you see the following log in console:

Ignoring unexpected goal response. There may be more than one action server for the action '{ACTION_NAME}'

This PR seems to try out a solution where all of those dictionaries for tracking requests on the python side are protected by a lock. I think there might be multiple ways to tackle this.

@aditya2592
Copy link
Author

Apologies for not linking the issue, I am currently still iterating on the fix and the PR is not ready yet. I have it directed to humble to test easily with my setup and planning to redirect to rolling if everything works and fix aligns with recommendation of the repo authors.

@aditya2592 aditya2592 changed the title avoid race condition in action client draft: avoid race condition in action client May 11, 2023
@dcconner
Copy link

dcconner commented May 17, 2023

You might consider doing a squash merge of your development efforts, and then basing your PR off of the single commit.

@fujitatomoya
Copy link
Collaborator

You might consider doing a squash merge of your development efforts, and then basing your PR off of the single commit.

i do not think this is needed since that is own development branch, and we can do so when we take the PR to mainline.

@apockill
Copy link
Contributor

Any thoughts on this proposal as a resolution? #1123 (comment)

If anyone whose familiar with the ActionClient has opinions on their preferred resolution method, I'm happy to make a PR based off of them! This race condition affects humble up to current.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st we need to target this to rolling as @clalancette requested.

@@ -30,7 +30,7 @@
from rclpy.waitable import NumberOfEntities, Waitable

from unique_identifier_msgs.msg import UUID

WAIT_TIMEOUT = 15.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used?

@@ -177,6 +176,11 @@ def __init__(
self._result_sequence_number_to_goal_id = {}
# key: UUID in bytes, value: callback function
self._feedback_callbacks = {}
self._data_lock = threading.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we can introduce the threading lock against the specific member data object, instead of having giant lock in the ActionClient. that can be also more performative.

see https://github.com/ros2/rclcpp/blob/e2965831d51e9be03470cb07f8721012afcade9b/rclcpp_action/src/client.cpp#L112-L119

Comment on lines +180 to +182
self._goal_event = threading.Event()
self._result_event = threading.Event()
self._cancel_event = threading.Event()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used?

@fujitatomoya
Copy link
Collaborator

#1308 replaces this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants