[BUG] Some data is lost during transmission. #1

XLzed · 2022-10-18T07:41:55Z

Describe the bug
Some data is lost during transmission，it causes the exception of grpc http2 deframe, and netty benchmark example hangs because of waiting for all data.

Steps to Reproduce

grpc command: ./build/example/install/hadronio/bin/hadronio grpc benchmark -m 10000 -rs 10000 -as 10000 -r 0.0.0.0
netty command: ./build/example/install/hadronio/bin/hadronio netty benchmark throughput -s -l 100000 -m 1000

Additional info

grpc exceptions
- Stream x does not exist
- Frame of type 0 must be associated with a stream.
- INTERNAL: Encountered end-of-stream mid-frame
- Frame length: x exceeds maximum: y
netty benchmark thourghput hangs

fruhland · 2022-10-18T11:22:37Z

Can you please provide some information on your test system? Especially, which type of network interconnect are you using (Ethernet, InfiniBand, etc.)?
The only error I recognize is "Stream x does not exist" from gRPC, but for me, it only occurs on a specific system and the benchmarks work fine on other systems.

XLzed · 2022-10-18T15:18:31Z

Can you please provide some information on your test system? Especially, which type of network interconnect are you using (Ethernet, InfiniBand, etc.)? The only error I recognize is "Stream x does not exist" from gRPC, but for me, it only occurs on a specific system and the benchmarks work fine on other systems.

I test it locally and the machine have no rdma device, so the examples run with tcp only (I also set UCX_TLS=tcp).

System Info

Linux version 4.19.95-17 (root@runner-857a6918-project-16016-concurrent-0) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC))
openjdk 11.0.16 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu118.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu118.04, mixed mode, sharing)
UCX version：1.13.1
ucx_info：ucx_info.log

Sequence Number Test

I also add an additional seqNumber in the head of message to debug, and find that some messages are lost or not retrieved correctly . Some logs like: [WRN][HadronioSocketChannel] recv sequence number error, required [159], but get [290]

command: ./build/example/install/hadronio/bin/hadronio netty benchmark throughput -s -l 1000 -m 100000
client.log server.log
command: ./build/example/install/hadronio/bin/hadronio grpc benchmark -m 100 -rs 10000 -as 10000 -s
grpc-client.log grpc-server.log

I also tested between two machines that supports ROCEv2, but the exception also occurred. Some information of rdma test environment：

Ethernet controller: Mellanox Technologies MT28850
MLNX_OFED_LINUX-5.4-3.4.0.0
rdma-core v35.4

I can use ucx and ibverbs to communicate directly, maybe the logic of tag_send/recv or of RingBuffer cause this problem?

XLzed · 2022-10-19T03:23:17Z

If I force the sendTaggedMessage to be blocking, the examples works fine.

//      final boolean completed = endpoint.sendTaggedMessage(sendBuffer.memoryAddress() + index, messageLength, tag, true, blocking);
        final boolean completed = endpoint.sendTaggedMessage(sendBuffer.memoryAddress() + index, messageLength, tag, true, true);

fruhland · 2022-10-19T07:53:01Z

Thanks for the detailed report. I will try to reproduce the issue and have a look into whats going wrong.

XLzed · 2022-10-20T13:40:28Z

It seems that tag matching semantic is not completed in order strictly. Maybe we have to deal with out-of-order, or use another semantic of UCX? I don't know if the data is still received in the same order as the receive buffer are submitted when the tasks can't complete in order.

fruhland · 2022-10-21T08:57:10Z

According to this (openucx/ucx#6370), tag matching messages will be received in order.

If I invoke two upc_tag_send_nb on same ep one by one，Will these two send requests will be completed in the invoke order？Does it matter with whether I use RC or not?

They may be completed in a different order, but will be matched in the same order on receiver

Yangfisher1 · 2024-08-29T02:14:22Z

We encountered the same problem as "Frame of type 0 must be associated with a stream". It happened when testing a grpc demo replaced with hadroNIO if we were using tcp transport(on my local mac) or RDMA under RoCEv2 environment. However, when it switched to an InfiniBand cluster, everything worked well. Is the problem solved?

Yangfisher1 · 2024-08-29T03:58:48Z

We encountered the same problem as "Frame of type 0 must be associated with a stream". It happened when testing a grpc demo replaced with hadroNIO if we were using tcp transport(on my local mac) or RDMA under RoCEv2 environment. However, when it switched to an InfiniBand cluster, everything worked well. Is the problem solved?

It's not correctly. I found the problem might be due to the size of the RingBuffer. When I reduce the size of the data transfered by grpc, it works well.

Yangfisher1 · 2024-08-29T08:07:35Z

I don't know how to explain this, but when I set DEFAULT_BUFFER_SLICE_LENGTH to 16K, which is the maxium size of data frame in HTTP2, the problem disappeared.

XLzed · 2024-08-29T08:32:44Z

I don't know how to explain this, but when I set DEFAULT_BUFFER_SLICE_LENGTH to 16K, which is the maxium size of data frame in HTTP2, the problem disappeared.

There are some bugs, the transport protocol it implements does not guarantee that the received data can be processed in order. ucx's tag match semantics switches between eager and rndv based on dataSize to reduce latency by replacing multiple send with a single rdma read, which may cause callbacks using the rndv protocol are delayed, but do not affect the order in which buffers are received. However, the library uses the execution order of callback functions as the parsing order of the received buffer, resulting in a disordered packet order.

So I changed to use JUCX directly in my use case, which can avoid the data copy of ringbuffer meanwhile, but more code development is needed.

Yangfisher1 · 2024-08-29T09:30:52Z

@XLzed Thanks! The problem seems like a little bit tricky.

Actually we developed a version of using JUCX directly to transmit data in grpc. However, it needs to modify the rpc handler code and we want a transparent solution and the project seems like what we want. But it looks like far way from directly using it 😂

fruhland · 2024-08-30T08:27:14Z

We tested on different setups with ConnectX-3 and ConnectX-5 cards and never encountered this problem. It seems like InfiniBand cards are not affected by this.

Yangfisher1 · 2024-08-30T08:41:56Z

@fruhland I tested the demo on a IB cluster and a RoCEv2 cluster. I think not the "IB cards" but the "IB switch" that prevent the problem. Because the RoCEv2 cluster also used CX6 while the underlying transport was based on UDP rather than IB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Some data is lost during transmission. #1

[BUG] Some data is lost during transmission. #1

XLzed commented Oct 18, 2022

fruhland commented Oct 18, 2022

XLzed commented Oct 18, 2022

XLzed commented Oct 19, 2022

fruhland commented Oct 19, 2022

XLzed commented Oct 20, 2022

fruhland commented Oct 21, 2022

Yangfisher1 commented Aug 29, 2024

Yangfisher1 commented Aug 29, 2024

Yangfisher1 commented Aug 29, 2024

XLzed commented Aug 29, 2024

Yangfisher1 commented Aug 29, 2024

fruhland commented Aug 30, 2024

Yangfisher1 commented Aug 30, 2024

[BUG] Some data is lost during transmission. #1

[BUG] Some data is lost during transmission. #1

Comments

XLzed commented Oct 18, 2022

fruhland commented Oct 18, 2022

XLzed commented Oct 18, 2022

System Info

Sequence Number Test

XLzed commented Oct 19, 2022

fruhland commented Oct 19, 2022

XLzed commented Oct 20, 2022

fruhland commented Oct 21, 2022

Yangfisher1 commented Aug 29, 2024

Yangfisher1 commented Aug 29, 2024

Yangfisher1 commented Aug 29, 2024

XLzed commented Aug 29, 2024

Yangfisher1 commented Aug 29, 2024

fruhland commented Aug 30, 2024

Yangfisher1 commented Aug 30, 2024