Skip to content

Commit

Permalink
[Spark][TEST-ONLY] Test identity high water mark inserts when target …
Browse files Browse the repository at this point in the history
…table has no high water mark (delta-io#4049)

#### Which Delta project/connector is this regarding?

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
Test only PR. Adds some basic additional tests for inserting data into
tables when the Identity Column high water mark isn't already defined.
Split from the PR making identity column high water mark updates more
consistent: delta-io#3989

## How was this patch tested?

New tests pass.

## Does this PR introduce _any_ user-facing changes?
No.
  • Loading branch information
c27kwan authored and huan233usc committed Jan 17, 2025
1 parent a022c54 commit b6e8e6b
Showing 1 changed file with 36 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,42 @@ trait IdentityColumnIngestionSuiteBase extends IdentityColumnTestUtils {
}
}

test("Appending from a source table with a high water mark should not update" +
" the target table's high water mark") {
withSrcAndDestTables(
isSrcDataSubsetOfTgt = false,
positiveStep = true,
expectValidHighWaterMark = false) { (srcTblName, tgtTblName) =>
val tgtDeltaLog = DeltaLog.forTable(spark, TableIdentifier(tgtTblName))
// dataframe v2
spark.table(srcTblName).writeTo(tgtTblName).append()
assert(getHighWaterMark(tgtDeltaLog.update(), colName = "id").isEmpty,
"High watermark should not be set for user inserted data.")

// v1
spark.table(srcTblName).write.format("delta").mode("append").saveAsTable(tgtTblName)
assert(getHighWaterMark(tgtDeltaLog.update(), colName = "id").isEmpty,
"High watermark should not be set for user inserted data.")

spark.table(srcTblName).write.insertInto(tgtTblName)
assert(getHighWaterMark(tgtDeltaLog.update(), colName = "id").isEmpty,
"High watermark should not be set for user inserted data.")

// SQL
sql(s"INSERT INTO $tgtTblName SELECT * FROM $srcTblName")
assert(getHighWaterMark(tgtDeltaLog.update(), colName = "id").isEmpty,
"High watermark should not be set for user inserted data.")

sql(s"INSERT INTO $tgtTblName BY NAME SELECT * FROM $srcTblName")
assert(getHighWaterMark(tgtDeltaLog.update(), colName = "id").isEmpty,
"High watermark should not be set for user inserted data.")

sql(s"INSERT INTO $tgtTblName(id, value) SELECT id, value FROM $srcTblName")
assert(getHighWaterMark(tgtDeltaLog.update(), colName = "id").isEmpty,
"High watermark should not be set for user inserted data.")
}
}

for {
cdfEnabled <- DeltaTestUtils.BOOLEAN_DOMAIN
isSrcDataSubsetOfTgt <- DeltaTestUtils.BOOLEAN_DOMAIN
Expand Down

0 comments on commit b6e8e6b

Please sign in to comment.