-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel]Cache the crc info for a snapshot if its version's checksum file is read #4113
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Requested some changes
// We found a CRCInfo of a version (a) older than the one we are looking for (snapshotVersion) | ||
// but (b) newer than the current hint. Use this CRCInfo to create a new hint. | ||
return new Tuple2<>( | ||
Optional.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have duplicate creations of snapshot hints from the CRC Info. I think you could make a SnapshotHint::fromCrcInfo
static helper in SnapshotHint.java
@@ -392,4 +372,57 @@ private Map<String, DomainMetadata> loadDomainMetadataMap(Engine engine) { | |||
throw new UncheckedIOException("Could not close iterator", ex); | |||
} | |||
} | |||
|
|||
// Calculates the latest snapshot hint before current snapshot version, returns the CRCInfo if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts:
-
Please use
/** */
for multi-line methoddocs, no//
-
Calculates the latest snapshot hint before current snapshot version
--> below, you also return it ifsnapshotHint.version == snapshotVersion
. So I don't think "before" is correct, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to "before or at the current snapshot version". Thanks for catching this boundary case.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Outdated
Show resolved
Hide resolved
s"INSERT INTO delta.`$tablePath` SELECT 1L as id" | ||
) | ||
assert( | ||
Files.deleteIfExists( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth making a helper function that deletes CRC at a given version in a given table path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could probably add it to TestUtils
@@ -17,18 +17,21 @@ package io.delta.kernel.defaults | |||
|
|||
import java.io.File | |||
import java.util.Optional | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: undo these deletes plz
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Show resolved
Hide resolved
@@ -384,7 +384,7 @@ class LogReplayEngineMetricsSuite extends AnyFunSuite with TestUtils { | |||
// Tests for loading P & M through checksums files // | |||
///////////////////////////////////////////////////////////////////////////////////////////////// | |||
|
|||
Seq(-1L, 3L, 4L).foreach { version => // -1 means latest version | |||
Seq(-1L, 0L, 3L, 4L).foreach { version => // -1 means latest version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0L for the case https://github.com/delta-io/delta/pull/4113/files#r1940334805
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please implement the one small change I've asked for, but then LGTM!
Optional<CRCInfo> crcInfoOpt = | ||
ChecksumReader.getCRCInfo( | ||
engine, logSegment.getLogPath(), snapshotVersion, crcSearchLowerBound); | ||
if (crcInfoOpt.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's reduce indentation by checking and exiting earlier.
if (!crcInfoOpt.isPresent()) {
return new Tuple2<>(snapshotHint, Optional.empty());
}
// the rest of the code, but now it is not indented!
@@ -165,6 +172,11 @@ public long getVersion() { | |||
return logSegment.getVersion(); | |||
} | |||
|
|||
/** Returns the crc info for the current snapshot if the checksum file is read */ | |||
public Optional<CRCInfo> getCurrentCrcInfo() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason that we only save it if it's the current version? in the future won't we also want to still save the info for an earlier one? (to pass along as much info as possible)
Which Delta project/connector is this regarding?
Description
Persist CRCInfo for a logreplay.java if its version's checksum file is read. This will be used for calculate the new CRC info for a new commit. No actual functional changes to loading crc.
Follow up PR will be #4116 which introduces crc_simple post commit action to write CRC file
How was this patch tested?
Unit test
Does this PR introduce any user-facing changes?
No