Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous client calls in a timer cause the timer to never execute a second time. #1018

Open
jwhtkr opened this issue Oct 3, 2022 · 11 comments · May be fixed by #1211
Open

Asynchronous client calls in a timer cause the timer to never execute a second time. #1018

jwhtkr opened this issue Oct 3, 2022 · 11 comments · May be fixed by #1211
Labels

Comments

@jwhtkr
Copy link

jwhtkr commented Oct 3, 2022

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • foxy (the 2022-09-28 release that was just pushed through the apt repos)
  • DDS implementation:
    • Fast-RTPS for sure
    • Any (I think)
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

Use the minimal service server from the examples

from example_interfaces.srv import AddTwoInts

import rclpy
from rclpy.node import Node


class MinimalService(Node):

    def __init__(self):
        super().__init__('minimal_service')
        self.srv = self.create_service(AddTwoInts, 'add_two_ints', self.add_two_ints_callback)

    def add_two_ints_callback(self, request, response):
        response.sum = request.a + request.b
        self.get_logger().info('Incoming request\na: %d b: %d' % (request.a, request.b))

        return response


def main(args=None):
    rclpy.init(args=args)

    minimal_service = MinimalService()

    rclpy.spin(minimal_service)

    rclpy.shutdown()


if __name__ == '__main__':
    main()

and a slightly modified service client:

from example_interfaces.srv import AddTwoInts
import rclpy
from rclpy.node import Node

from rclpy.executors import MultiThreadedExecutor
from rclpy.callback_groups import MutuallyExclusiveCallbackGroup


class MinimalClient(Node):
    def __init__(self):
        super().__init__("minimal_client_async")
        self.cb = None
        self.timer_cb = MutuallyExclusiveCallbackGroup()
        self.cli = self.create_client(
            AddTwoInts, "add_two_ints", callback_group=self.cb
        )
        while not self.cli.wait_for_service(timeout_sec=1.0):
            self.get_logger().info("service not available, waiting again...")
        self.req = AddTwoInts.Request()

        # Create a timer for the main loop execution
        self.timer = self.create_timer(
            1.0, self.main_loop, callback_group=self.timer_cb
        )

    def send_request(self, a, b):
        self.get_logger().info("send_request: enter")
        self.req.a = a
        self.req.b = b

        # Only works once, then never executes the timer again
        self.future = self.cli.call_async(self.req)
        rclpy.spin_until_future_complete(self, self.future)
        self.get_logger().info("send_request: exit")
        return self.future.result()

        # Used to work, but blocks now
        # return self.cli.call(self.req)

    def main_loop(self) -> None:
        response = self.send_request(4, 2)
        self.get_logger().info(
            "Result of add_two_ints: for %d + %d = %d" % (4, 2, response.sum)
        )


def main(args=None):
    rclpy.init(args=args)

    minimal_client = MinimalClient()

    executor = MultiThreadedExecutor()
    executor.add_node(minimal_client)
    executor.spin()

    rclpy.shutdown()


if __name__ == "__main__":
    main()

Run the service server in one terminal and the client in another.

Expected behavior

You repeatedly see the client call the server every second and get a response.

Actual behavior

The client runs correctly once, but then hangs (never executes the timer loop again)

Additional information

Previous to the 2022-09-28 patch release of foxy, a workaround was to use a synchronous call. However, that is non-functional now (see #1016 for more info/workarounds on that issue). As this behavior is functional in rclcpp in foxy, and seems to be what the documentation of foxy recommends it would be good to have this be functional.

@llapx
Copy link
Contributor

llapx commented Oct 13, 2022

The client runs correctly once, but then hangs (never executes the timer loop again)

The hangs happend when you use rclpy.spin_until_future_complete(self, self.future):

def spin_until_future_complete(
    node: 'Node',
    future: Future,
    executor: 'Executor' = None,
    timeout_sec: float = None
) -> None:
    """
    Execute work until the future is complete.

    Callbacks and other work will be executed by the provided executor until ``future.done()``
    returns ``True`` or the context associated with the executor is shutdown.

    :param node: A node to add to the executor to check for work.
    :param future: The future object to wait on.
    :param executor: The executor to use, or the global executor if ``None``.
    :param timeout_sec: Seconds to wait. Block until the future is complete
        if ``None`` or negative. Don't wait if 0.
    """
    executor = get_global_executor() if executor is None else executor
    try:
        executor.add_node(node)
        executor.spin_until_future_complete(future, timeout_sec)
    finally:
        executor.remove_node(node)
  • when you pass params without executor, then rclpy.spin_until_future_complete will use global executor which not the same executor as you defined.

  • when you pass params with executor, then rclpy.spin_until_future_complete will remove the node finally (the node and its timer never scheduled again).

So IMO, rclpy.spin_xxx may conflict with executor.spin_xxx (which you defined), it's better to use executor.spin_until_future_complete directly.

@apockill
Copy link
Contributor

This may be a symptom of the cause found in #1016

@llapx
Copy link
Contributor

llapx commented Oct 14, 2022

@apockill

I have tested with latest of rclpy, which already applied the commit in #1016, but not work.

@llapx
Copy link
Contributor

llapx commented Oct 14, 2022

Here's my solution to the client:

...
    def send_request(self, a, b):
        self.get_logger().info("send_request: enter")
        self.req.a = a
        self.req.b = b

        self.future = self.cli.call_async(self.req)
        while rclpy.ok():
            if self.future.done() and self.future.result():
                self.get_logger().info("send_request: exit")
                return self.future.result()

        return None
...

@fujitatomoya
Copy link
Collaborator

I think this is adding the same node base to the different and multiple executor, which does not work. (this is because GuardCondition does not work, rclcpp does have the same behavior.)

@llapx 's suggestion works, but i would recommend that you could use async/await instead, cz you do not want to consume the extra CPU to wait for the future completed.

...<snip>
    async def send_request(self, a, b):
        self.get_logger().info("send_request: enter")
        self.req.a = a
        self.req.b = b
        return await self.cli.call_async(self.req)

    async def main_loop(self) -> None:
        response = await self.send_request(4, 2)
        self.get_logger().info(
            "Result of add_two_ints: for %d + %d = %d" % (4, 2, response.sum)
        )
...<snip>

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Oct 14, 2022
@iuhilnehc-ynos
Copy link
Contributor

#1018 (comment) and #1018 (comment) seem not good to me.
What if users want to call a request by timeout?

I suggest using

+ self.executor.spin_until_future_complete(self.future)
- rclpy.spin_until_future_complete(self, self.future)

in the example.

Let me share my understanding of this issue.
In the main function,

    executor = MultiThreadedExecutor()
    executor.add_node(minimal_client)        

Please notice that there is a property named executor in the Node.
executor.add_node(minimal_client) makes the
minimal_client.executor = executor(which is the MultiThreadedExecutor() instance) in the

node.executor = self
.

after calling rclpy.spin_until_future_complete without executor parameter in the send_request,

executor.add_node(node)

the executor.add_node(node) will call the node.executor=self again,
and then the self.executor(which is the MultiThreadedExecutor() instance) adding the node(minimal_client) before will remove the node in

rclpy/rclpy/rclpy/node.py

Lines 279 to 283 in e18084f

current_executor = self.executor
if current_executor == new_executor:
return
if current_executor is not None:
current_executor.remove_node(self)

(This is the reason why the executor = MultiThreadedExecutor() not spin the events of minimal_client any more.)

rclcpp explicitly uses an exception to control One node should not be added to two Executors.
I think rclpy should also follow this strategy.

@jwhtkr
Copy link
Author

jwhtkr commented Oct 17, 2022

All, thanks for all the discussion and potential solutions. I would just like to note that the two solutions that @fujitatomoya and @iuhilnehc-ynos brought up (using async def/await and directly using the node's executor), while functional, are pretty un-intuitive to arrive at on one's own. I'm not sure if that means that the functionality needs to be adjusted to be more user-friendly, the documentation needs adjustments to explain these concepts better, or both of those courses. But as someone who is switching from ROS1 (i.e., I'm quite familiar with general ROS concepts like sub/pub srvs topics, etc), I have spent an inordinate amount of time trying to just figure out how to do what should be relatively straightforward things like this. For example, given the documentation (and some older questions/comments on sites like answers.ros.org that seemed to suggest it wouldn't work) I had no idea that rclpy worked with python's standard async def and await syntax.
Again, I want to express how appreciative I am of all the suggestions, discussion, and work that people have put in to help with these issues I've run into, as well as to develop ROS2, rclpy, etc. I just worry that the trees of features and improvements have distracted from the forest of overall usability that (at least for me) has been a solid point of ROS1 for many years now.

@fujitatomoya
Copy link
Collaborator

@molasses11 thank you so much for being constructive ! those are just suggestions, i am not saying that this is not the problem, that will be some areas that we can refine.

  • more concrete and clear example and documentation to use node executor.
  • migration from ROS 1 to ROS 2, documentation to avoid this problem.

let's keep this open, i am not sure i can have bandwidth for this soon though.

@808brick
Copy link

808brick commented May 5, 2023

I can confirm I am experiencing the same issue.

I tried experimenting with Executors and MutuallyExclusiveCallbackGroup like molasses11 with code which has timers/loops in it, and was able to get proper (non-blocking) functionality using Services by creating another node (within the same node using rclpy.create_node()) and having this new node send the request, and return the response it gets back to my main node (via class self).

This work-around however does not work with Actions. With the action all ROS functionality of the node gets frozen until the timer/loop finishes (ex: unable to get parameters or make service requests from the same node). And depending on how you're doing the looping (while loop with rate object, rclpy.spin_once, executor.spin_once) I sometimes have it where the Node refuses and new requests once the loop is finished (even if it accepted requests fine while looping).

I've found using MutuallyExclusiveCallbackGroup for the service callback option, and creating a custom executor for the node and using the executor.spin_once() call within a while loop (you need to make sure executor.spin_once() goes at the end of your loop, if not it causes issues) relieves the issue for services.

Currently reverting my Action to a Service since I can get around not needing feedback and request cancellation features for right now in my application, but hope this issue gets resolved.

Wanted to mention all this in case it helps show a correlation of the issue with client calls blocking future requests of the Node.

@meyerj
Copy link

meyerj commented Jan 12, 2024

We experienced similar issuses in different contexts. I think the problem is deeper, not only related to client calls and timers.

Essentially all implementations of rclpy.spin_*() methods in rclpy/__init__.py are somehow broken for the following reason, similar to what @iuhilnehc-ynos already outlined above:

  • If they are called recursively from within a callback, which is not explicitly forbidden, then the node instance was already added before to that executor. This results in False being returned from Executor.add_node(node), but the return value is not used in the spin functions.
  • Even if the node has not be added in this spin call, it will be removed from the executor unconditionally in the finally block.
  • So if any of those methods is used recursively, the node will not be spinned anymore afterwards.

I do think that recursive spinning is a desirable feature and apparently many users rely on that for implementing timeouts without blocking all other callbacks. Before recent #1188 calling a service with a timeout that works in any context was surprisingly hard, but the synchronous call() with a timeout does currently not allow other callbacks to run in between, like with recursive spinning. Even the most basic Python service client example in the official documentation uses rclpy.spin_until_future_complete(self, self.future), but in a method called directly from main() and not by an already running executor, like if a service would be called in another callback.

Suggestion:

So probably the "best" solution would be to try adding the node to the executor for spinning, but keep the return value and only remove it in finally if it was successfully added before? Something like

    try:
        was_added = executor.add_node(node)
        executor.spin_until_future_complete(future, timeout_sec)
    finally:
        if was_added:
            executor.remove_node(node)

of with a context manager that temporarily adds the node to the executor if it was not added before, instead of the repeated try ... finally construct.

Alternatively there must be a way to call executor.spin_until_future_complete(future, timeout_sec) directly without a specific node instance if it was already added to the executor before, like inside another callback of that node.

Something like

node.executor.spin_until_future_complete(future, timeout_sec)

instead of

rclpy.spin_until_future_complete(future, timeout_sec)

should also solve the problem described in this issue, and similar ones.

When checking on how to find the current executor of a node, I also found that Executor.add_node(node) is always called twice recursively, because it invokes the setter for node.executor which in turn calls back into executor.add_node(node). This only works because executor._nodes_lock is a recursive lock and add_node(node) checks whether the node was already added to self._nodes before.

Afaik there is currently no way to get the executor of the current thread, if any, without a reference to the node? I think asyncio.get_event_loop() uses thread local storage for that. So calls like rclpy.spin_until_future_complete(node, self.future) from within a callback would also fail, or spin the wrong executor, if used inside a callback that is executed by another than the global executor. If there was such a way, then using the executor of the current thread as the default executor in global rclpy.spin_*() functions would be a reasonable default, and only fall back to the global executor in case the call is from a thread context without any executor, like the main thread before any spin function was called. Of course that would be a quite severe change to the basics of rclpy...

Interestingly, in rclcpp recursively spinning a node results in an exception being thrown in Executor::add_node(), so it behaves fundamentally different (in version 6f6b5f2). See also @wjwwood's related TODO comment in rclcpp::spin_node_until_future_complete(). On the long term it would be nice if the two most widely used client libraries behave the same with this respect.

@meyerj meyerj linked a pull request Jan 20, 2024 that will close this issue
@jackkolb
Copy link

jackkolb commented Dec 2, 2024

Hi! This issue is still present in the current Foxy version (1.0.13). We are trying to get around it by creating an intermediary node to interface with the service. Any updates on the pending merge request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants