-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Load the protocol and metadata from the CRC files when available #4077
Conversation
… in DeltaLog Initial read current version CRC w test to verify wip end-2-end working + tests Co-authored-by: Allison Portis <[email protected]> Co-authored-by: Venki Korukanti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left some comments.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/VersionStats.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/ChecksumReader.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/ChecksumReader.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/ChecksumReader.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/ChecksumReader.java
Outdated
Show resolved
Hide resolved
ChecksumReader.getVersionStats( | ||
engine, logSegment.logPath, snapshotVersion, crcSearchLowerBound); | ||
if (versionStatsOpt.isPresent()) { | ||
// We found the protocol and metadata for the version we are looking for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a check that versionStateOpt.get.version is > (or >=, whatever is corect) than the snapshot hint?
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Outdated
Show resolved
Hide resolved
...el/kernel-defaults/src/test/scala/io/delta/kernel/defaults/LogReplayEngineMetricsSuite.scala
Outdated
Show resolved
Hide resolved
...el/kernel-defaults/src/test/scala/io/delta/kernel/defaults/LogReplayEngineMetricsSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableReadsSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/CRCInfo.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/CRCInfo.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/CRCInfo.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/CRCInfo.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/ChecksumReader.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/ChecksumReader.java
Outdated
Show resolved
Hide resolved
try (CloseableIterator<FileStatus> crcFiles = | ||
engine.getFileSystemClient().listFrom(lowerBoundFilePath.toString())) { | ||
List<FileStatus> crcFilesList = new ArrayList<>(); | ||
crcFiles | ||
.filter(file -> isChecksumFile(new Path(file.getPath()))) | ||
.forEachRemaining(crcFilesList::add); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottsand-db I wonder if this can be refactored to use any of the log listing methods in your PR?
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/FileNames.java
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableReadsSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableReadsSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableReadsSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/CRCInfo.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/CRCInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
public static <T> List<T> toFilteredList( | ||
CloseableIterator<T> iterator, Function<T, Boolean> filter) { | ||
List<T> result = new ArrayList<>(); | ||
iterator.filter(filter).forEachRemaining(result::add); | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry nit, I think it's easier if this is just toList
and we can do the iterator filtering in the outer call. This way we can re-use this even when we don't want to filter
i.e. toList(iterator.filter(filter))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor remaining comments then LGTM
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/ChecksumReader.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/CRCInfo.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/replay/LogReplay.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Unfortunately, takeWhile
definitely needs to be its own PR.
Which Delta project/connector is this regarding?
Description
CP crc loading code to master branch from "kernel-20250115-crc-optimization"
PR created using
git cherry-pick 7d32f66d9efd1dca41ed2a45cf259525cbdd8952
How was this patch tested?
Does this PR introduce any user-facing changes?
No