[Kernel] Adds protocol checks to the public getChanges API on TableImpl #3651

allisonport-db · 2024-09-05T21:21:05Z

Which Delta project/connector is this regarding?

Description

To avoid reading invalid tables, Kernel should check that any read protocol actions are supported by Kernel. This PR makes the current API private, and adds a public API around it that does this check when the Protocol is included in the list of actions to be read from the file.

Also removes the "byVersion" part of the API name since we are adding separate timestamp APIs in #3650.

How was this patch tested?

Adds unit tests.

vkorukanti

LGTM

vkorukanti · 2024-09-06T05:27:50Z

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableImpl.java

+      return getRawChanges(engine, startVersion, endVersion, actionSet)
+          .map(
+              batch -> {
+                int protocolIdx =


get the protocol col index once outside of the .map on line 207.

We can't access the schema w/o the batch

vkorukanti · 2024-09-06T05:29:35Z

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableFeatures.java

@@ -48,19 +45,19 @@ public class TableFeatures {
  ////////////////////

  public static void validateReadSupportedTable(
-      Protocol protocol, Metadata metadata, String tablePath) {
+      Protocol protocol, Optional<Metadata> metadata, String tablePath) {


can we just add a method that checks readerFeatures doesn't contain that Kernel doesn't know. Similar to SUPPORTED_WRITER_FEATURES, add SUPPORTED_READER_FEATURES?

with that this method can also be refactored to use the set and then do specific metadata checks based on the readerFEatuer (e.g. column mapping)

Factored out the SUPPORTED_READER_FEATURES. I think it still seems fine to me to include the metadata as an optional parameter; if it's present we do the metadata checks.

Separate fx would still need to check versions + the features.

vkorukanti

Just realized one thing: what happens if the connector just requests ADD, REMOVEs. but there is a protocol update between the versions requested? The updated protocol could cause the ADD info returned wrong. The new protocol feature (unknown to Kernel) may add some extra fields to ADD which Kernel won't be reading.

scottsand-db

LGTM after implmenting my minor feedback

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableImpl.java

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableFeatures.java

scottsand-db · 2024-09-11T00:37:37Z

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableImpl.java

+    return getRawChanges(engine, startVersion, endVersion, copySet)
+        .map(
+            batch -> {
+              int protocolIdx = batch.getSchema().indexOf("protocol"); // must exist


is this a common pattern throughout kernel code?

i.e. have a batch, get the index of a particular column, and then for each non-null value apply a function?

would it be cleaner to abstract this to a method on top of the ColumnarBatch interface that lets us apply a function for all non-null values for a particular column name?

if this occurrence is rare, then LGTM, and no need for premature optimization

I don't think it's that common but we probably do it elsewhere once or twice. But I don't think an abstraction would necessarily belong on the ColumnarBatch interface though, maybe if we end up supporting UDF-like expressions this could be done there

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableImpl.java

allisonport-db added 3 commits September 5, 2024 14:14

Check

f10c8e6

Fix tests

6baa448

Rename

be8244a

allisonport-db requested review from vkorukanti and scottsand-db September 5, 2024 21:23

Javadoc fix

5c2463a

vkorukanti approved these changes Sep 6, 2024

View reviewed changes

vkorukanti requested changes Sep 6, 2024

View reviewed changes

Update protocol checks

941c2dc

scottsand-db approved these changes Sep 11, 2024

View reviewed changes

allisonport-db added 2 commits September 10, 2024 18:04

Update feature check

d5ec3a3

Respond to comments

0083d82

allisonport-db requested a review from vkorukanti September 11, 2024 01:59

vkorukanti approved these changes Sep 11, 2024

View reviewed changes

allisonport-db merged commit 27cdcb9 into delta-io:master Sep 11, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Adds protocol checks to the public getChanges API on TableImpl #3651

[Kernel] Adds protocol checks to the public getChanges API on TableImpl #3651

allisonport-db commented Sep 5, 2024

vkorukanti left a comment

vkorukanti Sep 6, 2024

allisonport-db Sep 11, 2024

vkorukanti Sep 6, 2024

vkorukanti Sep 6, 2024

allisonport-db Sep 11, 2024

vkorukanti left a comment

scottsand-db left a comment

scottsand-db Sep 11, 2024

allisonport-db Sep 11, 2024

[Kernel] Adds protocol checks to the public getChanges API on TableImpl #3651

[Kernel] Adds protocol checks to the public getChanges API on TableImpl #3651

Conversation

allisonport-db commented Sep 5, 2024

Which Delta project/connector is this regarding?

Description

How was this patch tested?

vkorukanti left a comment

Choose a reason for hiding this comment

vkorukanti Sep 6, 2024

Choose a reason for hiding this comment

allisonport-db Sep 11, 2024

Choose a reason for hiding this comment

vkorukanti Sep 6, 2024

Choose a reason for hiding this comment

vkorukanti Sep 6, 2024

Choose a reason for hiding this comment

allisonport-db Sep 11, 2024

Choose a reason for hiding this comment

vkorukanti left a comment

Choose a reason for hiding this comment

scottsand-db left a comment

Choose a reason for hiding this comment

scottsand-db Sep 11, 2024

Choose a reason for hiding this comment

allisonport-db Sep 11, 2024

Choose a reason for hiding this comment