Skip to content

EVF Tutorial Table Properties

Paul Rogers edited this page Jun 18, 2019 · 2 revisions

Override Plugin Config using Table Properties

Previous sections talked about using a "provided schema", set with CREATE SCHEMA to specify column types. We can go one step further. We can allow the user to override plugin config properties using table properties in the provided schema.

The log format plugin defines two three properties: the regex, the maximum error count and the schema. We already saw how we can use the provided schema to add types to the plugin schema. Here, we will see how we can replace all three properties.

A provided schema has two parts: a list of columns (which we saw previously) and a set of table properties. We will use table properties in this section.

Where we want to get is:

  • The user creates an "empty" plugin config, specifying only the file extension (so that the plugin is associated with the right files.)
  • The user specifies the regex (and max error count) using table properties in the provided schema.
  • The columns in the schema provide column names and types. They match up to regex groups by position, just as in the plugin config.

The result is that, if users have many log files of different formats, they can specify that format per table rather than creating a config for each different file type. The primary goal is to show how this is done, the resulting functionality may even be useful.

Override the Regex Property

Previous steps have already shown how we get the provided schema. Let's define some property names:

public class LogFormatPlugin extends EasyFormatPlugin<LogFormatConfig> {
  ...
  public static final String PROP_PREFIX = TupleMetadata.DRILL_PROP_PREFIX + "regex.";
  public static final String REGEX_PROP = PROP_PREFIX + "regex";
  public static final String MAX_ERRORS_PROP = PROP_PREFIX + "maxErrors";

The resulting names are drill.regex.regex and drill.regex.maxErrors. The drill prefix is standard for anything that Drill provides (users can add their own properties with other names.) The regex field was chosen to identify these as properties specific to this one plugin. And, the end names are the same as the names as the plugin config properties. They don't have to be, but using the same names may make the properties easier to remember.

Next, let's get the regex property and use it in place of the plugin config:

  private Pattern setupPattern(TupleMetadata providedSchema) {
    String regex = formatConfig.getRegex();
    if (providedSchema != null) {
      regex = providedSchema.property(REGEX_PROP, regex);
    }
    return Pattern.compile(regex);
  }

That's really all there is to it: if a regex is given in the provided schema, it is used. Everything else works as before.

The logic for overriding maxErrors is similar, except we use intPropety() to get the integer value of a property.

The Tricky Bits

For this plugin, things are not quite as simple as the above implies. Since the regex plugin has a schema, we need some logic to decide when to use the plugin config schema, and when to use columns in the provided schema. See LogFormatPlugin for the details. Most plugins won't have that complexity because most don't have a plugin config way to define a schema.

Example

Suppose we have the Drill log files and we just want to pull out the date. Example line:

2017-12-17 10:52:41,820 [main] INFO  o.a.d.e.e.f.FunctionImplementationRegistry...

Here is the CREATE SCHEMA statement we would use:

CREATE SCHEMA (
  `year` int not null,
  `month` int not null,
  `day` int not null)
FOR TABLE dfs.example.myTable
PROPERTIES (
  'drill.regex.regex'='(\d\d\d\d)-(\d\d)-(\d\d) .*',
  'drill.regex.maxErrors'='10')

You can see this example in action in TestLogReader'.testSchemaOnlyWithCols().


Next: Enhanced Error Reporting

Clone this wiki locally