Optimization of Filter Performance

Building on the work done to optimize sort performance, this page looks at optimizing filter performance using the same techniques.

Baseline

The test uses the same mock data generator (DRILL-5152, PR 708 to produce 10 million rows with one integer each. Data is randomly uniformly distributed in the Java int range. We use a WHERE clause to eliminate roughly half the rows:

SELECT id_i FROM `mock`.employee_50M WHERE id_i > 0

The above means to create an integer column (_i) in a table of 50 million rows (_50m). As it turns out, using only 10 million rows results in run times that are too quick to be easily analyzed.

We use the test framework from DRILL-5126, PR 710 to run timed tests on a Macbook Pro with an Intel i5 processor. The mock data source produces batches of about 32K in size (adjusted down from the default 256K.) This gives about 8K records per batch and about 6100 batches passed to the filter. All tests were run from within Eclipse, using the Run option (not debug) using an embedded Drillbit.

The baseline runs the test five times, takes the average of all but the first, and prints details for the second-to-last:

Read  24992460 records in 6104 batches.
...
Avg run time: 1784
Op: 0 Screen
  Setup:   0 - 0%, 0%
  Process: 11 - 0%, 0%
  Wait:    26
Op: 1 Project
  Setup:   0 - 0%, 0%
  Process: 6 - 0%, 0%
Op: 2 SelectionVectorRemover
  Setup:   1 - 25%, 0%
  Process: 189 - 12%, 12%
Op: 3 Filter
  Setup:   3 - 75%, 0%
  Process: 575 - 38%, 38%
Op: 4 Scan
  Setup:   0 - 0%, 0%
  Process: 709 - 47%, 47%
Total:
  Setup:   4
  Process: 1490

In the above, times are in ms. The two percentages are percentage of the category (setup or process) and percent of total run time.

Observations:

Selection vector remover setup takes 12% of the run time.
Filter takes 38% of the run time.
We are already fairly efficient because the mock scan (the bare minimum data source, no disk I/O) takes 47% of the run time.

Easy First Steps

The first thing to do is to apply the lessons already learned. Let's switch to plain Java code generation with the JDK.

Results:

Read  25003415 records in 6104 batches.
...
Avg run time: 1724
...
Op: 2 SelectionVectorRemover
  Setup:   1 - 25%, 0%
  Process: 178 - 12%, 12%
Op: 3 Filter
  Setup:   3 - 75%, 0%
  Process: 498 - 36%, 35%
Op: 4 Scan
  Setup:   0 - 0%, 0%
  Process: 690 - 49%, 49%
Total:
  Setup:   4
  Process: 1382

This got us a very small gain: about 60ms or about 3%. The important reason for the switch is that doing so allows us to hand-optimize the filter code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of Filter Performance

Baseline

Easy First Steps

Clone this wiki locally