Is it viable to implement `max-partitions-per-scan` option on Iceberg connector, as like Hive connector? #20714

okayhooni · 2024-02-02T09:10:11Z

okayhooni
Feb 2, 2024

We need to prevent too heavy scan queries on Iceberg format table, as before on Hive format table with hive.max-partitions-per-scan option.

In order to implement that option(=iceberg.max-partitions-per-scan) also on the Iceberg connector logic, I read and compared related Iceberg connector's codes with Hive connector's, and noticed there are big differences in SplitManager logic. (HiveSplitManager.java & IcebergSplitManager.java)

So, I think it is impossible to transplant the scan-partitions constraint logic from Hive connector, to Iceberg connector, as it is.

But, I guess it may be possible to implement scan-partitions constraint on the getNextBatch() method & pruneFileScanTask() method on IcebergSplitSource.java.

https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java#L181

But, it seems that people who have a better understanding of Trino and Iceberg than I do, might have had reasons for not implementing this option.

@findepi , @wendigo , @raunaqmorarka , @mosabua , @findinpath
Could you provide advice on this?

Answered by raunaqmorarka

Feb 5, 2024

Unlike Hive, in Iceberg there isn't a separate metadata call for fetching partitions of a table that this functionality can be built on top of. Since we enumerate splits lazily and get them through iceberg library, we find out about partitions at execution time as the files are processed.
Just the number of partitions doesn't seem like a great metric for blocking things as it doesn't take into account the amount of data per partition, the number of columns queried and the effect of predicate pushdown in orc/parquet on the worker. It also ignores unpartitioned tables completely. I think query.max-scan-physical-bytes is a good substitute here. Maybe we can add an SPI in ConnectorMetadata (D…

View full answer

raunaqmorarka · 2024-02-05T12:34:43Z

raunaqmorarka
Feb 5, 2024
Collaborator

Unlike Hive, in Iceberg there isn't a separate metadata call for fetching partitions of a table that this functionality can be built on top of. Since we enumerate splits lazily and get them through iceberg library, we find out about partitions at execution time as the files are processed.
Just the number of partitions doesn't seem like a great metric for blocking things as it doesn't take into account the amount of data per partition, the number of columns queried and the effect of predicate pushdown in orc/parquet on the worker. It also ignores unpartitioned tables completely. I think query.max-scan-physical-bytes is a good substitute here. Maybe we can add an SPI in ConnectorMetadata (DataSize getMaxScanPhysicalBytes(ConnectorSession session, ConnectorTableHandle tableHandle)) to allow configuring this per table if that's what is needed to use this more easily.

0 replies

okayhooni · 2024-02-22T14:49:09Z

okayhooni
Feb 22, 2024
Author

I implemented some custom logic on the validateScan method of IcebergMetadata class, with check on SortedRangeSet of domain value, for our use case with some custom TBLPROPERTIES attached on each table..! (It works well as I expected.)

(TBLPROPERTIES example)

ALTER TABLE iceberg.raw_log.heavy_table
SET TBLPROPERTIES (
  'trino.query-partition-filter-required.fields'='log_ts',
  'trino.query-partition-filter-approximate-max-scan.field-to-days-map'='log_ts:400',
  'trino.query-partition-filter-approximate-max-scan.field-to-left-fallback-date-map'='log_ts:2023-01-01'
)

(snippet of custom logics)

...
...
Splitter.on(",").splitToStream(table.getStorageProperties().get(QUERY_PARTITION_FILTER_APPROXIMATE_MAX_SCAN_FIELD_TO_DAYS_MAP_KEY))
                        .map(String::trim)
                        .forEach(fieldKeyValuePairStr -> {
                            String[] fieldKeyValuePair = fieldKeyValuePairStr.split(":");
                            mandatoryFilterPartitionMaxScanFieldToTsMap.put(fieldKeyValuePair[0].trim(), Integer.parseInt(fieldKeyValuePair[1].trim()) * QUERY_PARTITION_FILTER_TIMESTAMPS_PER_DAY);
...
...
SortedRangeSet currentValueSet = (SortedRangeSet) valueDomain.getValues();
            currentValueSet.getRanges().getOrderedRanges().forEach(
                    range -> {
                        range.getLowValue().ifPresentOrElse(
                                lowValue -> {
                                    if (range.isSingleValue()) {
                                        mandatoryFilterPartitionMaxScanFieldToTsMap.put(columnName, mandatoryFilterPartitionMaxScanFieldToTsMap.get(columnName) - QUERY_PARTITION_FILTER_TIMESTAMPS_PER_DAY);
                                    }
                                    else {
                                        Long lowTimestamp = ((LongTimestampWithTimeZone) lowValue).getEpochMillis();
                                        validateMandatoryPartitionScanCountsWithLeftLowValue(lowTimestamp, columnName, range, mandatoryFilterPartitionMaxScanFieldToTsMap, requiredOption, table);
...
...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it viable to implement `max-partitions-per-scan` option on Iceberg connector, as like Hive connector? #20714

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Is it viable to implement max-partitions-per-scan option on Iceberg connector, as like Hive connector? #20714

okayhooni Feb 2, 2024

Replies: 2 comments

raunaqmorarka Feb 5, 2024 Collaborator

okayhooni Feb 22, 2024 Author

Is it viable to implement `max-partitions-per-scan` option on Iceberg connector, as like Hive connector? #20714

okayhooni
Feb 2, 2024

raunaqmorarka
Feb 5, 2024
Collaborator

okayhooni
Feb 22, 2024
Author