Apache Pinot Release 0.12.0
xiangfu0
released this
19 Jan 19:45
·
3018 commits
to master
since this release
What's Changed
Major updates
- Force commit consuming segments by @sajjad-moradi in #9197
- add a freshness based consumption status checker by @jadami10 in #9244
- Add metrics to track controller segment download and upload requests in progress by @gviedma in #9258
- Adding endpoint to download local log files for each component by @xiangfu0 in #9259
- [Feature] Add an option to search input files recursively in ingestion job. The default is set to true to be backward compatible. by @61yao in #9265
- add query cancel APIs on controller backed by those on brokers by @klsince in #9276
- Add Spark Job Launcher tool by @KKcorps in #9288
- Enable Consistent Data Push for Standalone Segment Push Job Runners by @yuanbenson in #9295
- Allow server to directly return the final aggregation result by @Jackie-Jiang in #9304
- TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
- Adaptive Server Selection by @vvivekiyer in #9311
- [Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
- Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
- Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
- skip late cron job with max allowed delay by @klsince in #9372
- Do not allow implicit cast for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9385
- Add missing properties in CSV plugin by @KKcorps in #9399
- set MDC so that one can route minion task logs to separate files cleanly by @klsince in #9400
- Add a new API to fix segment date time in metadata by @KKcorps in #9413
- Update get bytes to return raw bytes of string and support getBytesMV by @61yao in #9441
- Exposing consumer's record lag in /consumingSegmentsInfo by @navina in #9515
- Do not create dictionary for high-cardinality columns by @KKcorps in #9527
- get task runtime configs tracked in Helix by @klsince in #9540
- Add more options to json index by @Jackie-Jiang in #9543
- add SegmentTierAssigner and refine restful APIs to get segment tier info by @klsince in #9598
- Add segment level debug API by @saurabhd336 in #9609
- Add record availability lag for Kafka connector by @navina in #9621
- notify servers that need to move segments to new tiers via SegmentReloadMessage by @klsince in #9624
- Allow to configure multi-datadirs as instance configs and a Quickstart example about them by @klsince in #9705
- Customize stopword for Lucene Index by @jasperjiaguo in #9708
- Add memory optimized dimension table by @KKcorps in #9802
- ADLS file system upgrade by @xiangfu0 in #9855
- Added Delete Schema/Table pinot admin commands by @bagipriyank in #9857
- Adding new ADLSPinotFS auth type: DEFAULT by @xiangfu0 in #9860
- Add rate limit to Kinesis requests by @KKcorps in #9863
- Adding configs for zk client timeout by @xiangfu0 in #9975
Other features/changes
- Show most recent scheduling errors by @satishwaghela in #9161
- Do not use aggregation result for distinct query in IntermediateResultsBlock by @Jackie-Jiang in #9262
- Emit metrics for ratio of actual consumption rate to rate limit in realtime tables by @sajjad-moradi in #9201
- add metrics entry offlineTableCount by @walterddr in #9270
- refine query cancel resp msg by @klsince in #9242
- add @ManualAuthorization annotation for non-standard endpoints by @apucher in #9252
- Optimize ser/de to avoid using output stream by @Jackie-Jiang in #9278
- Add Support for Covariance Function by @SabrinaZhaozyf in #9236
- Throw an exception when MV columns are present in the order-by expression list in selection order-by only queries by @somandal in #9078
- Improve server query cancellation and timeout checking during execution by @jasperjiaguo in #9286
- Add capabilities to ingest from another stream without disabling the realtime table by @sajjad-moradi in #9289
- Add minMaxInvalid flag to avoid unnecessary needPreprocess by @npawar in #9238
- Add array cardinality function by @walterddr in #9300
- TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
- Add support for custom null values in CSV record reader by @KKcorps in #9318
- Infer parquet reader type based on file metadata by @saurabhd336 in #9294
- Include fmpp plugin module inside the src assembly file by @xiangfu0 in #9321
- Add Support for Cast Function on MV Columns by @SabrinaZhaozyf in #9296
- Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
- [Feature] Not Operator Transformation by @61yao in #9330
- Handle null string in CSV decoder by @KKcorps in #9340
- [Feature] Not scalar function by @61yao in #9338
- Add support for EXTRACT syntax and converts it to appropriate Pinot expression by @tanmesh in #9184
- Add support for Auth in controller requests in java query client by @KKcorps in #9230
- delete all related minion task metadata when deleting a table by @zhtaoxiang in #9339
- BloomFilterRule should only recommend for supported column type by @yuanbenson in #9364
- Support all the types in ParquetNativeRecordReader by @xiangfu0 in #9352
- Improve segment name check in metadata push by @zhtaoxiang in #9359
- Allow expression transformer cotinue on error by @xiangfu0 in #9376
- skip late cron job with max allowed delay by @klsince in #9372
- Enhance
and
filter predicate evaluation efficiency by @jasperjiaguo in #9336 - Deprecate instanceId Config For Broker/Minion Specific Configs by @ankitsultana in #9308
- Optimize combine operator to fully utilize threads by @Jackie-Jiang in #9387
- Terminate the query after plan generation if timeout by @jasperjiaguo in #9386
- [Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
- [Feature] Support Coalesce for Column Names by @61yao in #9327
- Disable logging for interrupted exceptions in kinesis by @KKcorps in #9405
- Benchmark thread cpu time by @jasperjiaguo in #9408
- Use ISODateTimeFormat as default for SIMPLE_DATE_FORMAT by @KKcorps in #9378
- Extract the common logic for upsert metadata manager by @Jackie-Jiang in #9435
- Make minion task metadata manager methods more generic by @saurabhd336 in #9436
- Always pass clientId to kafka's consumer properties by @navina in #9444
- Adaptive Server Selection by @vvivekiyer in #9311
- Refine IndexHandler methods a bit to make them reentrant by @klsince in #9440
- use MinionEventObserver to track finer grained task progress status on worker by @klsince in #9432
- Allow spaces in input file paths by @KKcorps in #9426
- Add support for gracefully handling the errors while transformations by @KKcorps in #9377
- Cache Deleted Segment Names in Server to Avoid SegmentMissingError by @ankitsultana in #9423
- Handle Invalid timestamps by @KKcorps in #9355
- refine minion worker event observer to track finer grained progress for tasks by @klsince in #9449
- spark-connector should use v2/brokers endpoint by @itschrispeck in #9451
- Remove netty server query support from presto-pinot-driver to remove pinot-core and pinot-segment-local dependencies by @xiangfu0 in #9455
- Adaptive Server Selection: Address pending review comments by @vvivekiyer in #9462
- track progress from within segment processor framework by @klsince in #9457
- Decouple ser/de from DataTable by @Jackie-Jiang in #9468
- collect file info like mtime, length while listing files for free by @klsince in #9466
- Extract record keys, headers and metadata from Stream sources by @navina in #9224
- [pinot-spark-connector] Bump spark connector max inbound message size by @cbalci in #9475
- refine the minion task progress api a bit by @klsince in #9482
- add parsing for AT TIME ZONE by @agavra in #9477
- Eliminate explosion of metrics due to gapfill queries by @elonazoulay in #9490
- ForwardIndexHandler: Change compressionType during segmentReload by @vvivekiyer in #9454
- Introduce Segment AssignmentStrategy Interface by @GSharayu in #9309
- Add query interruption flag check to broker groupby reduction by @jasperjiaguo in #9499
- adding optional client payload by @walterddr in #9465
- [feature] distinct from scalar functions by @61yao in #9486
- Check data table version on server only for null handling by @Jackie-Jiang in #9508
- Add docId and column name to segment read exception by @KKcorps in #9512
- Sort scanning based operators by cardinality in AndDocIdSet evaluation by @jasperjiaguo in #9420
- Do not fail CI when codecov upload fails by @Jackie-Jiang in #9522
- [Upsert] persist validDocsIndex snapshot for Pinot upsert optimization by @deemoliu in #9062
- broker filter by @dongxiaoman in #9391
- [feature] coalesce scalar by @61yao in #9487
- Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
- [GHA] add cache timeout by @walterddr in #9524
- Optimize PinotHelixResourceManager.hasTable() by @Jackie-Jiang in #9526
- Include exception when upsert metadata manager cannot be created by @Jackie-Jiang in #9532
- allow to config task expire time by @klsince in #9530
- expose task finish time via debug API by @klsince in #9534
- Remove the wrong warning log in KafkaPartitionLevelConsumer by @Jackie-Jiang in #9536
- starting http server for minion worker conditionally by @klsince in #9542
- Make StreamMessage generic and a bug fix by @vvivekiyer in #9544
- Improve primary key serialization performance by @KKcorps in #9538
- [Upsert] Skip removing upsert metadata when shutting down the server by @Jackie-Jiang in #9551
- add array element at function by @walterddr in #9554
- Handle the case when enableNullHandling is true and an aggregation function is used w/ a column that has an empty null bitmap by @nizarhejazi in #9566
- Support segment storage format without forward index by @somandal in #9333
- Adding SegmentNameGenerator type inference if not explicitly set in config by @timsants in #9550
- add version information to JMX metrics & component logs by @agavra in #9578
- remove unused RecordTransform/RecordFilter classes by @agavra in #9607
- Support rewriting forward index upon changing compression type for existing raw MV column by @vvivekiyer in #9510
- Support Avro's Fixed data type by @sajjad-moradi in #9642
- [feature] [kubernetes] add loadBalancerSourceRanges to service-external.yaml for controller and broker by @jameskelleher in #9494
- Limit up to 10 unavailable segments to be printed in the query exception by @Jackie-Jiang in #9617
- remove more unused filter code by @agavra in #9620
- Do not cache record reader in segment by @Jackie-Jiang in #9604
- make first part of user agent header configurable by @rino-kadijk in #9471
- optimize
order by sorted ASC, unsorted
andorder by DESC
cases by @gortiz in #8979 - Enhance cluster config update API to handle non-string values properly by @Jackie-Jiang in #9635
- Reverts recommender REST API back to PUT (reverts PR #9326) by @yuanbenson in #9638
- Remove invalid pruner names from server config by @Jackie-Jiang in #9646
- Using
usageHelp
instead of deprecatedhelp
in picocli commands by @navina in #9608 - Handle unique query id on server by @Jackie-Jiang in #9648
- stateless group marker missing several by @walterddr in #9673
- Support reloading consuming segment using force commit by @Jackie-Jiang in #9640
- Improve star-tree to use star-node when the predicate matches all the non-star nodes by @Jackie-Jiang in #9667
- add FetchPlanner interface to decide what column index to prefetch by @klsince in #9668
- Improve star-tree traversal using ArrayDeque by @Jackie-Jiang in #9688
- Handle errors in combine operator by @Jackie-Jiang in #9689
- return different error code if old version is not on master by @SabrinaZhaozyf in #9686
- Support creating dictionary at runtime for an existing column by @vvivekiyer in #9678
- check mutable segment explicitly instead of checking existence of indexDir by @klsince in #9718
- Remove leftover file before downloading segmentTar by @npawar in #9719
- add index key and size map to segment metadata by @walterddr in #9712
- Use ideal state as source of truth for segment existence by @Jackie-Jiang in #9735
- Close Filesystem on exit with Minion Tasks by @KKcorps in #9681
- render the tables list even as the table sizes are loading by @jadami10 in #9741
- Add Support for IP Address Function by @SabrinaZhaozyf in #9501
- bubble up error messages from broker by @agavra in #9754
- Add support to disable the forward index for existing columns by @somandal in #9740
- show table metadata info in aggregate index size form by @walterddr in #9733
- Preprocess immutable segments from REALTIME table conditionally when loading them by @klsince in #9772
- revert default timeout nano change in QueryConfig by @agavra in #9790
- AdaptiveServerSelection: Update stats for servers that have not responded by @vvivekiyer in #9801
- Add null value index for default column by @KKcorps in #9777
- [MergeRollupTask] include partition info into segment name by @zhtaoxiang in #9815
- Adding a consumer lag as metric via a periodic task in controller by @navina in #9800
- Deserialize Hyperloglog objects more optimally by @priyen in #9749
- Download offline segments from peers by @wirybeaver in #9710
- Thread Level Usage Accounting and Query Killing on Server by @jasperjiaguo in #9727
- Add max merger and min mergers for partial upsert by @deemoliu in #9665
- #9518 added pinot helm 0.2.6 with secure version pinot 0.11.0 by @bagipriyank in #9519
- Combine the read access for replication config by @snleee in #9849
- add v1 ingress in helm chart by @jhisse in #9862
- Optimize AdaptiveServerSelection for replicaGroup based routing by @vvivekiyer in #9803
- Do not sort the instances in InstancePartitions by @Jackie-Jiang in #9866
- Merge new columns in existing record with default merge strategy by @navina in #9851
- Support disabling dictionary at runtime for an existing column by @vvivekiyer in #9868
- support BOOL_AND and BOOL_OR aggregate functions by @agavra in #9848
- Use Pulsar AdminClient to delete unused subscriptions by @navina in #9859
- add table sort function for table size by @jadami10 in #9844
- In Kafka consumer, seek offset only when needed by @Jackie-Jiang in #9896
- fallback if no broker found for the specified table name by @klsince in #9914
- Allow liveness check during server shutting down by @Jackie-Jiang in #9915
- Allow segment upload via Metadata in MergeRollup Minion task by @KKcorps in #9825
- Add back the Helix workaround for missing IS change by @Jackie-Jiang in #9921
- Allow uploading realtime segments via CLI by @KKcorps in #9861
- Add capability to update and delete table config via CLI by @KKcorps in #9852
- default to TAR if push mode is not set by @klsince in #9935
- load startree index via segment reader interface by @klsince in #9828
- Allow collections for MV transform functions by @saurabhd336 in #9908
- Construct new IndexLoadingConfig when loading completed realtime segments by @vvivekiyer in #9938
- Make GET /tableConfigs backwards compatible in case schema does not match raw table name by @timsants in #9922
- feat: add compressed file support for ORCRecordReader by @etolbakov in #9884
- Add Variance and Standard Deviation Aggregation Functions by @snleee in #9910
- enable MergeRollupTask on realtime tables by @zhtaoxiang in #9890
- Update cardinality when converting raw column to dict based by @vvivekiyer in #9875
- Add back auth token for UploadSegmentCommand by @timsants in #9960
- Improving gz support for avro record readers by @snleee in #9951
- Default column handling of noForwardIndex and regeneration of forward index on reload path by @somandal in #9810
- [Feature] Support coalesce literal by @61yao in #9958
- Ability to initialize S3PinotFs with serverSideEncryption properties when passing client directly by @npawar in #9988
- handle pending minion tasks properly when getting the task progress status by @klsince in #9911
- allow gauge stored in metric registry to be updated by @zhtaoxiang in #9961
- support case-insensitive query options in SET syntax by @agavra in #9912
- pin versions-maven-plugin to 2.13.0 by @jadami10 in #9993
- Pulsar Connection handler should not spin up a consumer / reader by @navina in #9893
- Handle in-memory segment metadata for index checking by @Jackie-Jiang in #10017
- Support the cross-account access using IAM role for S3 PinotFS by @snleee in #10009
- report minion task metadata last update time as metric by @zhtaoxiang in #9954
- support SKEWNESS and KURTOSIS aggregates by @agavra in #10021
- emit minion task generation time and error metrics by @zhtaoxiang in #10026
- Use the same default time value for all replicas by @Jackie-Jiang in #10029
- Reduce the number of segments to wait for convergence when rebalancing by @saurabhd336 in #10028
UI Update & Improvement
- Allow hiding query console tab based on cluster config (#9261)
- Allow hiding pinot broker swagger UI by config (#9343)
- Add UI to show fine-grained minion task progress (#9488)
- Add UI to track segment reload progress (#9521)
- Show minion task runtime config details in UI (#9652)
- Redefine the segment status (#9699)
- Show an option to reload the segments during edit schema (#9762)
- Load schema UI async (#9781)
- Fix blank screen when redirect to unknown app route (#9888)
Multi-Stage Query Engine
New join semantics support
- Left join (#9466)
- In-equi join (#9448)
- Full join (#9907)
- Right join (#9907)
- Semi join (#9367)
- Using keyword (#9373)
New sql semantics support:
- Having (#9274)
- Order by (#9279)
- In/NotIn clause (#9374)
- Cast (#9384)
- LIke/Rexlike (#9654)
- Range predicate (#9445)
Performance enhancement
- Thread safe query planning (#9344)
- Partial query execution and round robin scheduling (#9753)
- Improve data table serde (#9731)
Library version upgrade
- Upgrade h3 lib from 3.7.2 to 4.0.0 to lower glibc requirement (#9335)
- Upgrade ZK version to 3.6.3 (#9612)
- Upgrade snakeyaml from 1.30 to 1.33 (#9464)
- Upgrade RoaringBitmap from 0.9.28 to 0.9.35 (#9730)
- Upgrade spotless-maven-plugin from 2.9.0 to 2.28.0 (#9877)
- Upgrade decode-uri-component from 0.2.0 to 0.2.2 (#9941)
BugFixes
- Fix bug with logging request headers by @abhs50 in #9247
- Fix a UT that only shows up on host with more cores by @klsince in #9257
- Fix message count by @Jackie-Jiang in #9271
- Fix issue with auth AccessType in Schema REST endpoints by @sajjad-moradi in #9293
- Fix PerfBenchmarkRunner to skip the tmp dir by @Jackie-Jiang in #9298
- Fix thrift deserializer thread safety issue by @saurabhd336 in #9299
- Fix transformation to string for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9287
- [hotfix] Add VARBINARY column to switch case branch by @walterddr in #9313
- Fix annotation for "/recommender" endpoint by @sajjad-moradi in #9326
- Fix jdk8 build issue due to missing pom dependency by @somandal in #9351
- Fix pom to use pinot-common-jdk8 for pinot-connector jkd8 java client by @somandal in #9353
- Fix log to reflect job type by @KKcorps in #9381
- [Bugfix] schema update bug fix by @MeihanLi in #9382
- fix histogram null pointer exception by @jasperjiaguo in #9428
- Fix thread safety issues with SDF (WIP) by @saurabhd336 in #9425
- Bug fix: failure status in ingestion jobs doesn't reflect in exit code by @KKcorps in #9410
- Fix skip segment logic in MinMaxValueBasedSelectionOrderByCombineOperator by @Jackie-Jiang in #9434
- Fix the bug of hybrid table request using the same request id by @Jackie-Jiang in #9443
- Fix the range check for range index on raw column by @Jackie-Jiang in #9453
- Fix Data-Correctness Bug in GTE Comparison in BinaryOperatorTransformFunction by @ankitsultana in #9461
- extend PinotFS impls with listFilesWithMetadata and some bugfix by @klsince in #9478
- fix null transform bound check by @walterddr in #9495
- Fix JsonExtractScalar when no value is extracted by @Jackie-Jiang in #9500
- Fix AddTable for realtime tables by @npawar in #9506
- Fix some type convert scalar functions by @Jackie-Jiang in #9509
- fix spammy logs for ConfluentSchemaRegistryRealtimeClusterIntegrationTest [MINOR] by @agavra in #9516
- Fix timestamp index on column of preserved key by @Jackie-Jiang in #9533
- Fix record extractor when ByteBuffer can be reused by @Jackie-Jiang in #9549
- Fix explain plan ALL_SEGMENTS_PRUNED_ON_SERVER node by @somandal in #9572
- Fix time validation when data type needs to be converted by @Jackie-Jiang in #9569
- UI: fix incorrect task finish time by @jayeshchoudhary in #9557
- Fix the bug where uploaded segments cannot be deleted on real-time table by @Jackie-Jiang in #9579
- [bugfix] correct the dir for building segments in FileIngestionHelper by @zhtaoxiang in #9591
- Fix NonAggregationGroupByToDistinctQueryRewriter by @Jackie-Jiang in #9605
- fix distinct result return by @walterddr in #9582
- Fix GcsPinotFS by @lfernandez93 in #9556
- fix DataSchema thread-safe issue by @walterddr in #9619
- Bug fix: Add missing table config fetch for /tableConfigs list all by @timsants in #9603
- Fix re-uploading segment when the previous upload failed by @Jackie-Jiang in #9631
- Fix string split which should be on whole separator by @Jackie-Jiang in #9650
- Fix server request sent delay to be non-negative by @Jackie-Jiang in #9656
- bugfix: Add missing BIG_DECIMAL support for GenericRow serde by @timsants in #9661
- Fix extra restlet resource test which should be stateless by @Jackie-Jiang in #9674
- AdaptiveServerSelection: Fix timer by @vvivekiyer in #9697
- fix PinotVersion to be compatible with prometheus by @agavra in #9701
- Fix the setup for ControllerTest shared cluster by @Jackie-Jiang in #9704
- [hotfix]groovy class cache leak by @walterddr in #9716
- Fix TIMESTAMP index handling in SegmentMapper by @Jackie-Jiang in #9722
- Fix the server admin endpoint cache to reflect the config changes by @Jackie-Jiang in #9734
- [bugfix] fix case-when issue by @walterddr in #9702
- [bugfix] Let StartControllerCommand also handle "pinot.zk.server", "pinot.cluster.name" in default conf/pinot-controller.conf by @thangnd197 in #9739
- [hotfix] semi-join opt by @walterddr in #9779
- Fixing the rebalance issue for real-time table with tier by @snleee in #9780
- UI: show segment debug details when segment is in bad state by @jayeshchoudhary in #9700
- Fix the replication in segment assignment strategy by @GSharayu in #9816
- fix potential fd leakage for SegmentProcessorFramework by @klsince in #9797
- Fix NPE when reading ZK address from controller config by @Jackie-Jiang in #9751
- have query table list show search bar; fix InstancesTables filter by @jadami10 in #9742
- [pinot-spark-connector] Fix empty data table handling in GRPC reader by @cbalci in #9837
- [bugfix] fix mergeRollupTask metrics by @zhtaoxiang in #9864
- Bug fix: Get correct primary key count by @KKcorps in #9876
- Fix issues for realtime table reload by @Jackie-Jiang in #9885
- UI: fix segment status color remains same in different table page by @jayeshchoudhary in #9891
- Fix bloom filter creation on BYTES by @Jackie-Jiang in #9898
- [hotfix] broker selection not using table name by @walterddr in #9902
- Fix race condition when 2 segment upload occurred for the same segment by @jackjlli in #9905
- fix timezone_hour/timezone_minute functions by @agavra in #9949
- [Bugfix] Move brokerId extraction to BaseBrokerStarter by @jackjlli in #9965
- Fix ser/de for StringLongPair by @Jackie-Jiang in #9985
- bugfix dir check for HadoopPinotFS.copyFromLocalDir by @klsince in #9979
- Bugfix: Use correct exception import in TableRebalancer. by @mayankshriv in #10025
- Fix NPE in AbstractMetrics From Race Condition by @ankitsultana in #10022