-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Skip the checksum for diskless replication #1181
[WIP] Skip the checksum for diskless replication #1181
Conversation
Signed-off-by: Shivshankar-Reddy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we avoid any kind of memory corruption by performing a checksum match? If CRC64 is expensive, we could look at other alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shivshankar-Reddy Sorry had some urgent issues and holidays so was unable to look at it yet.
Overall I like this added ability!
General question I have: I noticed that you only disabled CRC calculation on the primary side? Basically both primary and replica are performing the CRC calculation. When the replica is completely aware that the data is sent and loaded diskless it can just skip CRC check even when the primary did calculate and report it (for example a replica syncing with an older version primary). the replica can probably know that the Primary is streaming via socket by the fact that it send the EOF marker.
startSaving(RDBFLAGS_REPLICATION); | ||
getRandomHexChars(eofmark, RDB_EOF_MARK_SIZE); | ||
if (error) *error = 0; | ||
if (rioWrite(rdb, "$EOF:", 5) == 0) goto werr; | ||
if (rioWrite(rdb, eofmark, RDB_EOF_MARK_SIZE) == 0) goto werr; | ||
if (rioWrite(rdb, "\r\n", 2) == 0) goto werr; | ||
if (server.repl_diskless_sync && req & REPLICA_REQ_CHKSUM_SKIP) | ||
skip_cksum_repl |= RDBFLAGS_CKSUM_SKIP; | ||
if (rdbSaveRio(req, rdb, error, skip_cksum_repl, rsi) == C_ERR) goto werr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably keep only one rdbSaveRio :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noticing this. I am aware of this and removed in my local branch, this has had happened because I had merge conflict in the branch so created this by copying the code from original branch. it was missed during the copy paste
long rdb_diskless_load = 0; | ||
if (getRangeLongFromObjectOrReply(c, c->argv[j + 1], 0, 1, &rdb_diskless_load, NULL) != C_OK) return; | ||
if (rdb_diskless_load == 1) | ||
c->replica_req |= REPLICA_REQ_CHKSUM_SKIP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at this point the skip CRC is more "capability" than "requirement" on the replica side. So I would suggest adding a replconf capa indication instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially my plan also was to add in capabilities, However the capabilies were not passed many function as requirements being passed. Since requirements are so flexible in the available functions I added in the requirements. But I agree with you this should be part of the capa. PR is still in-progress I will try to make this in Capabilities instead of requirements. Thank you!
Thank you for the review and the input!
In my initial testing, found that replica was able to deteremine CRC is enabled/disabled so I thought it would be enough to disable at primary end. and I think you have the valid point to consider replica side aswell. I will look into this and will update the diff soon, thanks again! |
@Shivshankar-Reddy thank you so much for taking the time to handle this. I think that since it was in conflict with #1479 (which is almost completed) we will close this one. |
fixes: #1129
Below is the background information from #1129 by @madolson
Here is my understanding and proposal as a fix for the issue. Please correct me if my understanding is wrong or missing something!