Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make test play services more reliable #1033

Closed

Conversation

ivanpauno
Copy link
Member

Maybe fixes #862.

This is a workaround, it doesn't fix the underlying issue.

Signed-off-by: Ivan Santiago Paunovic <[email protected]>
@ivanpauno ivanpauno added the enhancement New feature or request label Jun 24, 2022
@ivanpauno ivanpauno requested a review from MichaelOrlov June 24, 2022 19:09
@ivanpauno ivanpauno requested a review from a team as a code owner June 24, 2022 19:09
@ivanpauno ivanpauno self-assigned this Jun 24, 2022
@ivanpauno ivanpauno requested review from jhdcs and removed request for a team June 24, 2022 19:09
Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanpauno I am sorry but this is not what I would expected to see in our tests.
Our tests exists to make sure that there are no issues rather than try to hide any issues.

Sending request multiple times in loop until we will get response is not acceptable, at least in this case.
Because we have expectations in tests and workflow in particularly in play_next the number or requests and corresponding responses shall match to each other.

I think that issue with discovery which you are trying to mitigate is not a root cause of the issue which we have with failing test_play_services tests. Because according to the logs the failure never happened on first service request from the tests it rather happened on 3-4 request and more often on 100s and more with test_play_next scenario.

@ivanpauno
Copy link
Member Author

ivanpauno commented Jun 27, 2022

Because according to the logs the failure never happened on first service request from the tests

are you sure?
SetUp() is called before each test fixture (i.e. each TEST_F), so there're a lot of calls that happen right after the services are creaated.

it rather happened on 3-4 request and more often on 100s and more with test_play_next scenario

In all the CI links in #862, none of them seem to be failing there.

@ivanpauno I am sorry but this is not what I would expected to see in our tests.

If it's a service discovery issue, the alternatives are doing this or waiting for some seconds after creating the services.
Fixing the real issue is a lot of work.

If it's not a discovery issue and it's another thing, I have no idea what's going on.
I would have to debug more.

@MichaelOrlov
Copy link
Contributor

@ivanpauno

are you sure?
SetUp() is called before each test fixture (i.e. each TEST_F), so there're a lot of calls that happen right after the services are creaated.

I am 100% sure that this is not a discovery issue. And I can prove it.
For instance form failure of this job https://build.ros2.org/job/Rpr__rosbag2__ubuntu_jammy_amd64/122/consoleText

[INFO] [1655556476.597988951] [rosbag2_player_for_test_srvs]: Playing next message.
8: [INFO] [1655556476.598254146] [rosbag2_player_for_test_srvs]: Playing next message.
8: [INFO] [1655556476.598445490] [rosbag2_player_for_test_srvs]: Playing next message.
8: [INFO] [1655556476.598640724] [rosbag2_player_for_test_srvs]: Playing next message.
8: [INFO] [1655556476.598978766] [rosbag2_player_for_test_srvs]: Playing next message.
8: [INFO] [1655556476.599309567] [rosbag2_player_for_test_srvs]: Playing next message.
8: [INFO] [1655556476.599803300] [rosbag2_player_for_test_srvs]: Playing next message.
8: /tmp/ws/src/rosbag2/rosbag2_transport/test/rosbag2_transport/test_play_services.cpp:108: Failure
8: Expected equality of these values:
8:   future.wait_for(service_call_timeout_)
8:     Which is: 4-byte object <01-00 00-00>
8:   std::future_status::ready
8:     Which is: 4-byte object <00-00 00-00>
8: virtual void PlaySrvsTest_play_next_Test::TestBody(), line : 378
 8/31 Test  #8: test_play_services__rmw_fastrtps_cpp ...........................***Timeout  60.06 sec

Test test_play_services__rmw_fastrtps_cpp failed and call for

auto future = cli->async_send_request(request);
EXPECT_TRUE(future.valid());
EXPECT_EQ(future.wait_for(service_call_timeout_), std::future_status::ready) <<

was called from line : 378
for (size_t i = 0; i < num_msgs_to_publish_; i++) {
{
std::lock_guard<std::mutex> lk(got_msg_mutex_);
message_counter_ = 0;
}
play_next_response = service_call_play_next();
ASSERT_TRUE(play_next_response->success);
expect_messages(true, false);
}

From log it is very clear that we successfully called play_next service 186 times in loop and on 187 it got failed
and such a failure in the middle of the play_next test happening in 90% cases on CI the rest 10% mostly happened in
TEST_F(PlaySrvsTest, is_paused)

TEST_F(PlaySrvsTest, is_paused)
{
start_playback();
player_->toggle_paused();
ASSERT_TRUE(service_call_is_paused());
expect_messages(false);
player_->toggle_paused();
ASSERT_FALSE(service_call_is_paused());
expect_messages(true);
player_->toggle_paused();
ASSERT_TRUE(service_call_is_paused());
expect_messages(false);
player_->toggle_paused();
ASSERT_FALSE(service_call_is_paused());
expect_messages(true);
}

and failure never happened on FIRST ASSERT_TRUE(service_call_is_paused()); but rather on 3d or 4th call in the scope of the one unit test lines 326 and 330 respectively.

I also was able to reproduce this issue multiple times on my local machine and it never happens on first call for auto future = cli->async_send_request(request); EXPECT_EQ(future.wait_for(service_call_timeout_), std::future_status::ready)

@audrow audrow changed the base branch from master to rolling June 28, 2022 14:50
@MichaelOrlov
Copy link
Contributor

Here is a fresh example from today's CI failure https://github.com/ros2/rosbag2/runs/7197760772?check_suite_focus=true

[ RUN      ] PlaySrvsTest.set_rate_good_values
2022-07-05T14:48:47.6799425Z 32: stdin is not a terminal device. Keyboard handling disabled.[INFO] [1657032380.610015078] [rosbag2_player_for_test_srvs]: Set rate to 1
2022-07-05T14:48:47.6799580Z 32: [INFO] [1657032380.614651567] [rosbag2_player_for_test_srvs]: Adding keyboard callbacks.
2022-07-05T14:48:47.6799737Z 32: [INFO] [1657032380.614693967] [rosbag2_player_for_test_srvs]: Press SPACE for Pause/Resume
2022-07-05T14:48:47.6799908Z 32: [INFO] [1657032380.614709067] [rosbag2_player_for_test_srvs]: Press CURSOR_RIGHT for Play Next Message
2022-07-05T14:48:47.6800075Z 32: [INFO] [1657032380.614808867] [rosbag2_player_for_test_srvs]: Press CURSOR_UP for Increase Rate 10%
2022-07-05T14:48:47.6800240Z 32: [INFO] [1657032380.614864067] [rosbag2_player_for_test_srvs]: Press CURSOR_DOWN for Decrease Rate 10%
2022-07-05T14:48:47.6800372Z 32: [INFO] [1657032380.614940767] [rosbag2_player_for_test_srvs]: Pausing play.
2022-07-05T14:48:47.6800500Z 32: [INFO] [1657032380.618946657] [rosbag2_player_for_test_srvs]: Resuming play.
2022-07-05T14:48:47.6800703Z 32: [INFO] [1657032380.619361556] [rosbag2_player_for_test_srvs]: Playback until timestamp: -1
2022-07-05T14:48:47.6800828Z 32: [INFO] [1657032380.619548056] [rosbag2_player_for_test_srvs]: Set rate to 2
2022-07-05T14:48:47.6800963Z 32: [INFO] [1657032380.620571854] [rosbag2_player_for_test_srvs]: Set rate to 0.5
2022-07-05T14:48:47.6801240Z 32: /__w/rosbag2/rosbag2/ros_ws/src/jduflpqya0l/rosbag2/rosbag2_transport/test/rosbag2_transport/test_play_services.cpp:120: Failure
2022-07-05T14:48:47.6801341Z 32: Expected equality of these values:
2022-07-05T14:48:47.6801452Z 32:   future.wait_for(service_call_timeout_)
2022-07-05T14:48:47.6801620Z 32:     Which is: 4-byte object <01-00 00-00>
2022-07-05T14:48:47.6814957Z 32:   std::future_status::ready
2022-07-05T14:48:47.6815222Z 32:     Which is: 4-byte object <00-00 00-00>
2022-07-05T14:48:47.6815416Z 32: virtual void PlaySrvsTest_set_rate_good_values_Test::TestBody(), line : 366
2022-07-05T14:48:47.6815603Z 32/78 Test #32: test_play_services__rmw_fastrtps_cpp ...................................***Timeout  60.05 sec

According to the log, error happened at line 366

TEST_F(PlaySrvsTest, set_rate_good_values)
{
start_playback();
auto set_request = std::make_shared<SetRate::Request>();
SetRate::Response::SharedPtr set_response;
GetRate::Response::SharedPtr get_response;
set_request->rate = 2.0;
set_response = service_call_set_rate(set_request);
ASSERT_TRUE(set_response->success);
get_response = service_call_get_rate();
ASSERT_EQ(get_response->rate, 2.0);
set_request->rate = 0.5;
set_response = service_call_set_rate(set_request);
ASSERT_TRUE(set_response->success);
get_response = service_call_get_rate();
ASSERT_EQ(get_response->rate, 0.5);
}

And this is second time calling get_response = service_call_get_rate(); in this test.
The first one was successfully called from line 360.

@ivanpauno
Copy link
Member Author

This doesn't seem to be the problem, so closing

@ivanpauno ivanpauno closed this Jul 15, 2022
@MichaelOrlov MichaelOrlov deleted the ivanpauno/make_test_play_services_more_reliable branch July 13, 2024 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

👨‍🌾 Regression in test_play_{timing,services}__rmw_{rmw_vendor} on the buildfarm jobs
3 participants