-
Notifications
You must be signed in to change notification settings - Fork 981
Storage Plugin Runtime Interface
With the plan complete, attention now shifts to execution. The foreman serializes the plan (including the sub scan operator descriptions), then ships the plan to each Drillbit for execution. Each minor fragment is run by a dedicated thread. The first task is to create the record batch (operator implementation) for each operator (description). To do this, the Foreman deserializes the plan to recreate the operator tree, using Jackson to create the plan objects as described above.
Next, the fragment executor must create an record batch each operator definition via the indirection of a record batch creator which implements BatchCreator
. This is done using a process similar to the way that plugin configurations are linked to plugins: via a map from operator definition class to record batch creator constructor.
During start the Drillbit searches the scan path for classes which implement BatchCreator
. For each, the code gets the list of implemented interfaces, one of which must be BatchCreator
. BatchCreator
itself is a parameterized type, so the code next looks for the type of the parameterization argument. For example:
public class MockScanBatchCreator implements BatchCreator<MockSubScanPOP> {
The creator class must have a zero-argument constructor. If so, that constructor is placed into a map keyed by the parameterized class (here MockSubScanPOP
.) The result: a map from physical operator (pop) definition to creator constructor.
Given this map, the fragment executor can easily find the constructor for the batch creator. From the constructor the executor gets the batch creator class and creates an instance. Finally, the executor calls:
public ScanBatch getBatch(FragmentContext context, MockSubScanPOP config, List<RecordBatch> children)
To get an instance of the scan batch (operator implementation) given the fragment context, operator definition and the list of previously-created children (upstream operators) for this operator. Since our focus here is on scanners, the child list will always be empty.
Given the batch created above, the fragment executor initializes the batch by calling the setup
method. Operators that know their schema can set up the schema here. Otherwise, schema setup can wait until later.
Most scanners extend ScanBatch
, but doing so is not required; it is simply a convenience. ScanBatch
creates an operator-specific RecordReader
subclass to handle actual reading, allowing the generic ScanBatch
operator implementation to handle interfacing into the Drill operator hierarchy.
The fragment now runs. The tree calls next()
on the ScanBatch
to return a batch of records. (By convention, the first batch should return just a schema.)
-
ScanBatch
callsallocate()
on theRecordReader
to set up the vectors for the record batch. - Repeatedly calls the
next()
method on theRecordReader
to read a batch of rows into the vectors allocated above. - Sets the row count on each of the value vectors.
- Determines if the batch includes a schema different than the previous batch.
- Passes along the proper status code to the caller.
Of course, some of these steps involve more than the summary suggests.
-
ScanBatch
defines aMutator
class which must be used to build the set of value vectors. TheMutator
takes a field schema in the form of aMaterializedField
. - Uses the
TypeHelper
to convert from the field schema to a value vector instance. - Registers the value vector with the vector container associated with the
ScanBatch
. - Adds the value vector to the field vector map, indexed by field name (actually, a field path for nested structures.)
Drill provides three ways to create a scan operator:
- From scratch: the plugin author creates a scan (batch) operator that handles all the details of the scan and integration into the Drill operator tree.
- Based on the Drill
ScanBatch
class: theScanBatch
handles the generic operator details, delegating actual reading to a subclass ofRecordReader
. - Via the "easy" framework. (Details needed.)