Skip to content

Is it viable to implement max-partitions-per-scan option on Iceberg connector, as like Hive connector? #20714

Answered by raunaqmorarka
okayhooni asked this question in Q&A
Discussion options

You must be logged in to vote

Unlike Hive, in Iceberg there isn't a separate metadata call for fetching partitions of a table that this functionality can be built on top of. Since we enumerate splits lazily and get them through iceberg library, we find out about partitions at execution time as the files are processed.
Just the number of partitions doesn't seem like a great metric for blocking things as it doesn't take into account the amount of data per partition, the number of columns queried and the effect of predicate pushdown in orc/parquet on the worker. It also ignores unpartitioned tables completely. I think query.max-scan-physical-bytes is a good substitute here. Maybe we can add an SPI in ConnectorMetadata (D…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by okayhooni
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #20543 on February 15, 2024 04:49.