Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ros::Time() in ROS-control update() call causes timing inconsistencies. #54

Closed
markusgft opened this issue Mar 11, 2019 · 12 comments
Closed

Comments

@markusgft
Copy link

The franka panda hardware interface for ROS-control implements a method called FrankaHW::control() which triggers the user-defined ROS-control update(ros::Time time, ros::Duration period) call which must be implemented by valid ROS-controller.

As can be seen here, the argument "period" is assigned the time-step, which is constantly 1 ms and therefore is perfectly valid.

However, the argument "time" gets assigned ros::Time::now(). The issue is, that Time::now() is not perfectly in sync with the franka hardware system time, as a user who calls a real-time control loop would expect. (One reason why this Time::now() is not in sync is for instance that the UDP-communication with the robot may experience delays, in which case ros::Time::now() may deviate significantly from the current panda system time).

Would it make sense to assign a different value to "time", i.e. the system time value from the currently read robot state?

=====================================================
Side-track : one motivating example why this is of practical concern:

  • consider you are tracking a perfectly linear position trajectory in a custom ros-controller, using the franka joint position interface.
  • The panda internally employs a joint PD controller, I assume that you are obtaining the required reference velocity implicitly from the position trajectory through some filtering step. This internal filter likely employs constant time steps of 1 ms.
  • If you evaluate your trajectory at constant increments of 1 ms, the resulting position update will be constant, and therefore the resulting reference velocity.
  • If you are now, as a user, evaluating your tracking controller using the "time" parameter (rather than duration), you evaluate your perfectly linear reference trajectory at non-uniform time spacing and send this command to the joint position interface. The arriving position increments will be non-constant, and hence the estimate for the joint velocity will be excited by he "noise" on the position update.
  • I am not entirely sure what role your internal velocity filter plays, but in practice I have seen the panda robot go unstable when using time "time" parameter for reference tracking using the position interface.
@fwalch
Copy link
Contributor

fwalch commented Mar 11, 2019

Would it make sense to assign a different value to "time", i.e. the system time value from the currently read robot state?

That is a very good question. It is not immediately obvious to me how the "time" parameter is supposed to be set up. The ros_control docu only says "The current time", which we understood as "ROS system time", i.e. ros::Time::now(). But yes, that value isn't really usable for writing a controller (and we always used the duration instead in our examples), so our implementation doesn't make a whole lot of sense. I think you are correct and we should change this to franka::RobotState::time. We'll discuss this internally.

@gavanderhoorn
Copy link
Contributor

In my experience, ros::Time::now() can be used in three cases:

  1. you're using a clock on the machine that calls it that is synchronised with whatever is "on the other side" of the hardware_interface (this would be the case where a PC for instance has been equipped with a motion controller card, or a clock-synchronised bus is used between the PC and an actuator)
  2. the delay between the machine that runs the hardware_interface and the system that is connected to the actuators is known, and the latter is a system that "adheres" to 1
  3. the time argument to the update(..) call is used to publish(..) ROS messages

In other cases (such as the one described by @markusgft) it would be best to use period, as that is independent of clock synchronisation (or at least: as long as the passage of time is sufficiently in-sync (ie: a second is a second, everywhere)).

The third situation is actually an interesting one, as if publications of ROS messages are stamped with the time "coming from the hardware" and the clocks are not in-sync, temporal correlation of messages coming from the ros_control stack and other dataflows in the ROS application will become/is impossible. As an example: consider a TF broadcaster that uses some Cartesian data from a hardware_interface. If the clocks are not in-sync, those TF frames will either slowly start to become post- or pre-dated, making lookups more and more problematic or leading to unnecessary interpolation by TF.

What is often done in hardware_interfaces that have access to some sort of time or clock coming from an underlying system is to use that clock -- as it often has a higher precision or is the one that inputs (to that underlying system) are expected to be in-sync with -- for the calculation of period only, and use ros::Time::now() for everything else.

If time must absolutely be in-sync with the underlying hardware, then that too could come from that hardware, but then ros::Time::now() is typically used in a way that can be correlated to the hw clock (using some sort of transformation scheme, possibly after a calibration period (ie: determine rate(local_clock) to rate(remove_clock)) or only for publications for which stamps must make sense when compared to other ros::Time derived stamps (such as the TF example above).

The ros_control docu only says "The current time", which we understood as "ROS system time", i.e. ros::Time::now(). But yes, that value isn't really usable for writing a controller (and we always used the duration instead in our examples), so our implementation doesn't make a whole lot of sense.

please note: there are multiple clock sources available when using ros::Time. Not all of them are useful in a ros_control context, but it's certainly not true that use of "ROS system time" universally doesn't make sense.

@fwalch
Copy link
Contributor

fwalch commented Mar 11, 2019

@gavanderhoorn Thanks for your post! I hadn't considered use-cases such as message publishing. I suppose you could always call ros::Time::now() yourself instead of using the time given to the ros_control callbacks, but I can also see how it could be confusing if there is a ros::Time instance which follows a different clock from "standard ROS", which is probably not what you expect. So it would need good documentation.

But, to summarize, would you recommend to keep the current behavior, i.e.: keep time with ros::Time::now() as a way to do things like message publishing and advise controller authors to use period for calculating control values?

please note: there are multiple clock sources available when using ros::Time

Are you referring to things like ros::WallTime and ros::SteadyTime?

@markusgft
Copy link
Author

Hi all, thanks a lot for your instant replies. Thanks @gavanderhoorn for highlighting some of the related issues. After all, the problems seem at least due to the fact that we have an unfortunate combination of multiple factors here. In a general ros- control context, I do see the point that a naive user should be able to use the ros::Time argument of the update loop for message time-stamping, etc.

If I understand correctly, ros::Time is currently not at all linked to the panda robots system clock (since a non-deterministic UDP protocol is used for communicating with the robot). So you are, with good arguments, suggesting to stick to "ros::Duration period" instead.
Certainly, in the originally mentioned trajectory tracking context, this allows you at least to sample a feedforward trajectory with constant time-spacing and avoid some trouble with the panda-internal reference filter. And I do agree that all would be fine if you could guarantee that the 1ms control period is held reliably and there are only very few package losses.

There is still one thing I would like to clarify:

  • Using the panda, we get a non-insignificant rate of package loss (according to the test-executable "communication test" that is shipped with libfranka, up to 10% of package loss can be considered a "success", in my own work I normally see 2-4 % of package loss). How do you currently handle this in ros control? While you are most likely just keeping the entries in the robot state constant in the event of packages loss, what is the current implementation of the hardware interface doing in terms of timing? Are you calling the control() routine anyway? Or in other words, after 1000 control cycles with package loss rate of 2-4 %, did the ros-control update call get called 1000 times or only 960-980 times? In the latter case any "duration" based controller would start to lag behind the true time, and that would be a strong argument for not using "duration" for control purposes.
  • In that context, even if you dont want to change the current implementation of the time-stamp setting in ros control, I assume it might make sense to use the time-stamp from the robot state for control and estimation purposes anyway?

@fwalch
Copy link
Contributor

fwalch commented Mar 11, 2019

Or in other words, after 1000 control cycles with package loss rate of 2-4 %, did the ros-control update call get called 1000 times or only 960-980 times?

In such a case, the ros_control update will be called less than 1000 times. However, the duration won't always be 1, but sometimes larger (which is also what's defined by ros_control: "The time passed since the last call to update"). If you sum up the durations, you will get a value of 1000.

@gavanderhoorn
Copy link
Contributor

@markusgft wrote:

Using the panda, we get a non-insignificant rate of package loss (according to the test-executable "communication test" that is shipped with libfranka, up to 10% of package loss can be considered a "success", in my own work I normally see 2-4 % of package loss).

That is a lot of packet loss tbh. The robots I work with -- and that have this type of motion interface -- wouldn't tolerate that. A single missed packet would mean a lost control cycle and the robot goes into an error mode (they do tolerate more missed packets if configured that way, but I generally don't use that). Franka Emika might have gone for a slightly less "picky/strict" policy of course.

Is the ethernet segment between the PC and the controller used for other traffic? Is there a switch?

With a real-time kernel (ie: RT PREEMPT or Xenomai) a 1kHz update loop should be trivial to run on a modern-ish PC.

How do you currently handle this in ros control? While you are most likely just keeping the entries in the robot state constant in the event of packages loss, what is the current implementation of the hardware interface doing in terms of timing? Are you calling the control() routine anyway? Or in other words, after 1000 control cycles with package loss rate of 2-4 %, did the ros-control update call get called 1000 times or only 960-980 times? In the latter case any "duration" based controller would start to lag behind the true time, and that would be a strong argument for not using "duration" for control purposes.

I'm guessing this question was directed at @fwalch, but to something I've observed (and implemented myself): typical real-time external motion interfaces send time-stamped state data (in some form or other) and expect control signals to be returned in the same (or next) cycle. The timestamps are set by the controller using its internal clock.

As long as period is calculated using the timestamps on the incoming state data, missed packets on the PC side would result in dt = pkt_current.stamp - pkt_prev.stamp just getting larger and larger.

I guess this is what @fwalch means when he writes:

If you sum up the durations, you will get a value of 1000.

If the ros_control control flow is also directly coupled with the robot controller's (ie: through the reception of "state pkts" or other triggers), timing of update-cycles would also be tied to the controller's control cycle and lost pkts could be handled gracefully, provided a time-out is used and some form of state extrapolation. Or with a suitable stopping strategy on the controller and an error on the PC side.

Certainly, in the originally mentioned trajectory tracking context, this allows you at least to sample a feedforward trajectory with constant time-spacing and avoid some trouble with the panda-internal reference filter. And I do agree that all would be fine if you could guarantee that the 1ms control period is held reliably and there are only very few package losses.

I've highlighted the two important assumptions here. Controllers that cannot reach 1 kHz update rates should not be used (and I would consider still using it a user error) and packet loss should either be absent or indeed very, very minimal.


Note: I did not want to argue against the need for a solution to the potential problem you described in your OP. I merely wanted to add some observations and my experience with ros_control and real-time external motion interfaces that have separate clocks.

@gavanderhoorn
Copy link
Contributor

@gavanderhoorn Thanks for your post! I hadn't considered use-cases such as message publishing. I suppose you could always call ros::Time::now() yourself instead of using the time given to the ros_control callbacks, but I can also see how it could be confusing if there is a ros::Time instance which follows a different clock from "standard ROS", which is probably not what you expect. So it would need good documentation.

The biggest problem I'd have with calling ros::Time::now() in every controller that would like to know current "ROS time" would be that you'd lose the correlation between data coming from controllers called in the same ControllerManager::update(..).

And it's not just publications: synchronisation between different controllers could also be affected.

But, to summarize, would you recommend to keep the current behavior, i.e.: keep time with ros::Time::now() as a way to do things like message publishing and advise controller authors to use period for calculating control values?

Thing is: I don't know what would be the best solution in your case. Perhaps some of the ros_control maintainers / devs have some ideas: @bmagyar, @ipa-mdl?

please note: there are multiple clock sources available when using ros::Time

Are you referring to things like ros::WallTime and ros::SteadyTime?

Yes.

@mathias-luedtke
Copy link

I think this is the way to go: PickNikRobotics/ros_control_boilerplate#4 (comment)
At least for joint_trajectory_controller, which just tracks ros:Time() and the accumulated periods ("uptime") and synchroniizes them only when a goal arrives.

@markusgft
Copy link
Author

That is a lot of packet loss tbh. The robots I work with -- and that have this type of motion interface -- wouldn't tolerate that. A single missed packet would mean a lost control cycle and the robot goes into an error mode (they do tolerate more missed packets if configured that way, but I generally don't use that). Franka Emika might have gone for a slightly less "picky/strict" policy of course.

With a real-time kernel (ie: RT PREEMPT or Xenomai) a 1kHz update loop should be trivial to run on a modern-ish PC.

Yes, that is indeed an inconveniently high package loss rate that we experience on the Panda. I am used to working with RT-systems like Xenomai -- in prior work, I worked with an EtherCAT-empowered robot running at 1 kHz hard real-time, where only a few package losses occurred per hour. It is side-tracking the discussion about the original topic, but in any case it would certainly help a lot if @fwalch could give a short comment about the "average expected" package loss rate on the panda (operated with an preempt_rt-patch as described by the franka installation tutorial).

@bmagyar
Copy link

bmagyar commented Mar 12, 2019

But, to summarize, would you recommend to keep the current behavior, i.e.: keep time with ros::Time::now() as a way to do things like message publishing and advise controller authors to use period for calculating control values?

I'm a little late to the party but my preferred solution is just that. We only have a few occasions where ros::Time::now() is called from within a controller in ros_controllers that I'd like to remove :)

As a general rule of thumb: your closed loop update()-s should use time for message stamping and timestamp comparison, period for dt.
There are cases where you could compute dt between commands using time to to figure out how to take action on them (skip or use a different dt).

In my experience with upstream controllers and also some closed-source, one has to be mindful about how time is passed to the update loop in the RobotHW implementation and how that relates to the rest of the system. Distributed setups make this a little harder, for those cases I'd rely less on timestamps.

@markusgft
Copy link
Author

Thanks for all the comments. So, to sum up - the result is to leave everything as is.

As far as I understand it, this is still a compromise - in an ideal case ros::Time would be synchronized with some kind of master-system clock, however this is not feasible for current Panda robot system design.

@bmagyar I wonder if it would make sense to include a little comment somewhere in ROS-control to document your above reasoning? I could imagine that I'm not the only one who potentially gets the intended use of the ros::Time in the update call wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants