-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Cannot determine whether the message is a duplicate at this time #21892
Comments
@graysonzeng Not related to the reported issue, but it's good to be aware that when using this type of config, it won't be optimal for read performance since sticky reads aren't used with bookies when E != Qw. More details in #18003 and apache/bookkeeper#4131 . Related Pulsar Slack thread: https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1699487100764749?thread_ts=1698225686.705339&cid=C5Z4T36F7 |
Thanks so much for the heads up! I definitely missed it. I will take the time to read it. |
@graysonzeng does it reproduce without brokerEntryMetadataInterceptors ? |
@lhotari I haven't tested it again, but I think it's not reproducible without brokerEntryMetadataInterceptors. Currently it seems that CompositeByteBuf will be generated in addBrokerEntryMetadata only when brokerEntryMetadataInterceptors are enabled. pulsar/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java Line 1723 in 0b6bd70
|
Search before asking
Version
pulsar version:3.1.1,master
Minimal reproduce step
broker count: 2
bookie count: 5
broker config:
managedLedgerDefaultAckQuorum: "2"
managedLedgerDefaultEnsembleSize: "4"
managedLedgerDefaultWriteQuorum: "3"
// Open Deduplication config
brokerDeduplicationEnabled: "true"
// enable Interceptor
brokerEntryMetadataInterceptors: org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor
Enable batch producer by default
Using pulsar perf the publishing rate is 200000 messages/sec and the total number of messages is 100000000.
Consume it at the same time.
bin/pulsar-perf produce persistent://pulsar/default/input_test -r 200000 -m 10000000
At the same time, Use a function to consume and produce messages, and set the sequenceId to the producer in the function.(Use EFFECTIVELY_ONCE mode)
What did you expect to see?
Complete the production and consumption of all messages
What did you see instead?
the producer fall into the following error and be stuck because of this error until the broker is restarted.
Anything else?
After stuck, the heap dump of the broker was generated and something unusual was discovered
the
pendingAddOps
of LedgerHandle is also retains a lot of requests, the first request status in the queue is not completed, and pendingWriteRequests = 0, and addEntrySuccessBookies is empty.But the second request is completed status.
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/MessageDeduplication.java
Line 381 in c834feb
In isDuplicate of MessageDeduplication, the sequenceId is between lastSequenceIdPersisted and highestSequencedPushed, this is the reason why we receive
Cannot determine whether the message is a duplicate at this time
errorThe client received this error, then disconnected and resent the message. The resent message was still at sequenceId > lastSequenceIdPersisted, causing it to fall into a loop.
Update
An important log message was found
It points to bookkeeper
DigestManager.computeDigestAndPackageForSendingV2()
https://github.com/apache/bookkeeper/blob/113d40ac5057709b3e44b9281231456b4ef81065/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/checksum/DigestManager.java#L149
In normal circumstances, after calculation, the result will be assigned to toSend and the payload will be changed to null.
https://github.com/apache/bookkeeper/blob/113d40ac5057709b3e44b9281231456b4ef81065/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L231
Therefore, we can see the peek of pendingAddOps still retains the payload, and toSend is empty
In bookkeeper
PendingAddOp.unsetSuccessAndSendWriteRequest()
, if toSend is null, it is return directly, So this request has been retained in pendingAddOps since computeDigest failedhttps://github.com/apache/bookkeeper/blob/113d40ac5057709b3e44b9281231456b4ef81065/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L183
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: