-
Notifications
You must be signed in to change notification settings - Fork 980
Optimization of Filter Performance
Building on the work done to optimize sort performance, this page looks at optimizing filter performance using the same techniques.
The test uses the same mock data generator (DRILL-5152, PR 708 to produce 10 million rows with one integer each. Data is randomly uniformly distributed in the Java int range. We use a WHERE clause to eliminate roughly half the rows:
SELECT id_i FROM `mock`.employee_50M WHERE id_i > 0
The above means to create an integer column (_i
) in a table of 50 million rows (_50m
). As it turns out, using only 10 million rows results in run times that are too quick to be easily analyzed.
We use the test framework from DRILL-5126, PR 710 to run timed tests on a Macbook Pro with an Intel i5 processor. The mock data source produces batches of about 32K in size (adjusted down from the default 256K.) This gives about 8K records per batch and about 6100 batches passed to the filter. All tests were run from within Eclipse, using the Run option (not debug) using an embedded Drillbit.
The baseline runs the test five times, takes the average of all but the first, and prints details for the second-to-last:
Read 24992460 records in 6104 batches.
...
Avg run time: 1784
Op: 0 Screen
Setup: 0 - 0%, 0%
Process: 11 - 0%, 0%
Wait: 26
Op: 1 Project
Setup: 0 - 0%, 0%
Process: 6 - 0%, 0%
Op: 2 SelectionVectorRemover
Setup: 1 - 25%, 0%
Process: 189 - 12%, 12%
Op: 3 Filter
Setup: 3 - 75%, 0%
Process: 575 - 38%, 38%
Op: 4 Scan
Setup: 0 - 0%, 0%
Process: 709 - 47%, 47%
Total:
Setup: 4
Process: 1490
In the above, times are in ms. The two percentages are percentage of the category (setup or process) and percent of total run time.
Observations:
- Selection vector remover setup takes 12% of the run time.
- Filter takes 38% of the run time.
- We are already fairly efficient because the mock scan (the bare minimum data source, no disk I/O) takes 47% of the run time.
The first thing to do is to apply the lessons already learned. Let's switch to plain Java code generation with the JDK.
Results:
Read 25003415 records in 6104 batches.
...
Avg run time: 1724
...
Op: 2 SelectionVectorRemover
Setup: 1 - 25%, 0%
Process: 178 - 12%, 12%
Op: 3 Filter
Setup: 3 - 75%, 0%
Process: 498 - 36%, 35%
Op: 4 Scan
Setup: 0 - 0%, 0%
Process: 690 - 49%, 49%
Total:
Setup: 4
Process: 1382
This got us a very small gain: about 60ms or about 3%. The important reason for the switch is that doing so allows us to hand-optimize the filter code.