-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Iceberg] Fix bugs in Iceberg statistics caching #24480
[Iceberg] Fix bugs in Iceberg statistics caching #24480
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix, LGTM. Just one little nit.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/hive/TestIcebergHiveStatistics.java
Outdated
Show resolved
Hide resolved
f78fc09
to
0a1a136
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small nit.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/hive/TestIcebergHiveStatistics.java
Outdated
Show resolved
Hide resolved
In the case of partial miss on the StatisticsFileCache, the loaded statistics were not combined with the cached statistics, causing discrepancies in query planning
Previously, only the cache stats were available through the JMX plugin because only the CacheStatsMBean was exported. The file size and column count distributions were not available. This fixes the issue by problem by exporting the StatisticsFileCache object instead and embedding the cache stats object
723d643
0a1a136
to
723d643
Compare
@@ -658,6 +658,7 @@ private Map<Integer, ColumnStatistics> loadStatisticsFile(IcebergTableHandle tab | |||
statisticsFileCache.put(new StatisticsFileCacheKey(file, key), value); | |||
finalResult.put(key, value); | |||
}); | |||
finalResult.putAll(cachedStats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's add a comment // Include already cached stats (those that were not missing)
statistics = getTableStatistics(queryRunner, session, "lineitem"); | ||
RuntimeMetric partialMiss = runtimeStats.getMetrics().keySet().stream().filter(name -> name.contains("PartialMiss")).findFirst() | ||
.map(runtimeStats::getMetric) | ||
.orElseThrow(() -> new RuntimeException("partial miss on statistics cache should have occurred")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead lets use Assert.fail
and log the runtimeStats
TransactionId txid = getQueryRunner().getTransactionManager().beginTransaction(false); | ||
Session txnSession = session.beginTransactionId(txid, getQueryRunner().getTransactionManager(), new AllowAllAccessControl()); | ||
Map<String, ColumnHandle> columnHandles = getColumnHandles(table, txnSession); | ||
Metadata meta = queryRunner.getMetadata(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use full variable name metadata
.ifPresent(stat -> assertEquals(32, runtimeStats.getMetric(stat).getSum())); | ||
runtimeStats.getMetrics().keySet().stream().filter(name -> name.contains("PuffinFileSize")).findFirst() | ||
.ifPresent(stat -> assertTrue(runtimeStats.getMetric(stat).getSum() > 1024)); | ||
// get them again to trigger retrieval of _some_ cached statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert that eviction count was more than 0 ? This proves that the cache was maxed out and not all read statistics have been cached
Description
This PR has two commits which fixes two minor bugs in statistics file caching. The first is related to returning wrong statistics on partial cache misses. The second as to do with missing stats in the reported JMX statistics on the StatisticsFileCache object.
Motivation and Context
Bug fixes. See commit messages for details
Impact
Test Plan
Contributor checklist
Release Notes