Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark can't read its own write after another one commits data to the iceberg table #10919

Closed
1 of 3 tasks
chenzl25 opened this issue Aug 12, 2024 · 2 comments
Closed
1 of 3 tasks
Labels
bug Something isn't working

Comments

@chenzl25
Copy link

chenzl25 commented Aug 12, 2024

Apache Iceberg version

1.5.2

Query engine

Spark

Please describe the bug 🐞

How to reproduce:

  • Spark-sql runs CREATE TABLE demo.demo_db.t_bug(a int) TBLPROPERTIES ('format-version'='2')
  • Spark-sql runs insert into demo.demo_db.t_bug values (1)
  • Another SDK (icelake): insert into demo.demo_db.t_bug values (2)
  • After that, the weird thing happens
  • select * from demo.demo_db.t_bug.files told me there are 2 parquet files, but when I run select * from demo.demo_db.t_bug only 1 row returns. And I run 2 time travel queries, you can see both snapshots return 1 row. The first time travel query result with id 6579104674932036030 is expected, but the second one with id 6579104674932036031 is unexpected
image
  • BTW, some query engines like clickhouse and duckdb can read 2 rows, so I think something wrong in spark.

I also include the table t_bug metadata and data directory here for those who are interested in this issue.

t_bug.zip

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@chenzl25 chenzl25 added the bug Something isn't working label Aug 12, 2024
@lurnagao-dahua
Copy link
Contributor

What is the result of executing query using SparkSQL:
select * from demo.demo_db.t_bug

@chenzl25
Copy link
Author

Never mind. I found it is a bug of icelake. It doesn't follow the iceberg spec. icelake-io/icelake#280 Let me close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants