Impala SQL Dialect

Impala is a MPP (Massive Parallel Processing) SQL query engine for processing data that is stored on a Hadoop cluster.

Registering the JDBC Driver in EXAOperation

Now register the driver in EXAOperation:

Click "Software"
Switch to tab "JDBC Drivers"
Click "Browse..."
Select JDBC driver file
Click "Upload"
Click "Add"
In dialog "Add EXACluster JDBC driver" configure the JDBC driver (see below)

You need to specify the following settings when adding the JDBC driver via EXAOperation.

Parameter	Value
Name	`IMPALA`
Main	`com.cloudera.impala.jdbc41.Driver`
Prefix	`jdbc:impala:`
Files	`ImpalaJDBC41.jar`

Uploading the JDBC Driver to EXAOperation

Create a bucket in BucketFS
Upload the driver to BucketFS

This step is necessary since the UDF container the adapter runs in has no access to the JDBC drivers installed via EXAOperation but it can access BucketFS.

Installing the Adapter Script

Upload the latest available release of Virtual Schema JDBC Adapter to Bucket FS.

Then create a schema to hold the adapter script.

CREATE SCHEMA ADAPTER;

The SQL statement below creates the adapter script, defines the Java class that serves as entry point and tells the UDF framework where to find the libraries (JAR files) for Virtual Schema and database driver.

CREATE OR REPLACE JAVA ADAPTER SCRIPT ADAPTER.JDBC_ADAPTER AS
  %scriptclass com.exasol.adapter.RequestDispatcher;
  %jar /buckets/<BFS service>/<bucket>/virtualschema-jdbc-adapter-dist-1.19.2.jar;
  %jar /buckets/<BFS service>/<bucket>/ImpalaJDBC41.jar;
/
;

Defining a Named Connection

Define the connection to Impala as shown below.

CREATE OR REPLACE CONNECTION IMPALA_CONNECTION 
TO 'jdbc:impala://<Impala host>:<port>' 
USER '<user>' 
IDENTIFIED BY '<password>';

Creating a Virtual Schema

Below you see how an Impala Virtual Schema is created. Please note that you have to provide the name of a schema.

CREATE VIRTUAL SCHEMA <virtual schema name>
    USING ADAPTER.JDBC_ADAPTER 
    WITH
    SQL_DIALECT     = 'IMPALA'
    CONNECTION_NAME = 'IMPALA_CONNECTION'
    SCHEMA_NAME     = '<schema name>';

Connecting To a Kerberos Secured Hadoop

Connecting to a Kerberos secured Impala service only differs in one aspect: You have a CONNECTION object which contains all the relevant information for the Kerberos authentication.

Understanding how it Works (Optional)

Both the adapter script and the internally used IMPORT FROM JDBC statement support Kerberos authentication. They detect, that the connection is a Kerberos connection by a special prefix in the IDENTIFIED BY field. In such case, the authentication will happen using a Kerberos keytab and Kerberos config file (using the JAAS Java API).

The CONNECTION object stores all relevant information and files in its fields:

The TO field contains the JDBC connection string
The USER field contains the Kerberos principal
The IDENTIFIED BY field contains the Kerberos configuration file and keytab file (base64 encoded) along with an internal prefix ExaAuthType=Kerberos; to identify the CONNECTION as a Kerberos CONNECTION.

Generating the CREATE CONNECTION Statement

In order to simplify the creation of Kerberos CONNECTION objects, the create_kerberos_conn.py Python script has been provided. The script requires 5 arguments:

CONNECTION name (arbitrary name for the new CONNECTION)
Kerberos principal for Hadoop (i.e., Hadoop user)
Kerberos configuration file path (e.g., krb5.conf)
Kerberos keytab file path, which contains keys for the Kerberos principal
JDBC connection string

Example command:

python tools/create_kerberos_conn.py krb_conn krbuser@EXAMPLE.COM /etc/krb5.conf ./krbuser.keytab \
  'jdbc:impala://<Impala host>:<port>;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=host.example.com;KrbServiceName=impala'

Output:

CREATE CONNECTION krb_conn TO 'jdbc:impala://<Impala host>:<port>;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=host.example.com;KrbServiceName=impala' USER 'krbuser@EXAMPLE.COM' IDENTIFIED BY 'ExaAuthType=Kerberos;enp6Cg==;YWFhCg=='

Creating the connection

You have to execute the generated CREATE CONNECTION statement directly in EXASOL to actually create the Kerberos CONNECTION object. For more detailed information about the script, use the help option:

python tools/create_kerberos_conn.py -h

Using the Connection When Creating a Virtual Schema

You can now create a virtual schema using the Kerberos connection created before.

CREATE VIRTUAL SCHEMA <virtual schema name> 
   USING ADAPTER.JDBC_ADAPTER
   WITH
   SQL_DIALECT     = 'IMPALA'
   CONNECTION_NAME = 'KRB_CONN'
   SCHEMA_NAME     = '<schema name>';

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

impala.md

impala.md

Impala SQL Dialect

Registering the JDBC Driver in EXAOperation

Uploading the JDBC Driver to EXAOperation

Installing the Adapter Script

Defining a Named Connection

Creating a Virtual Schema

Connecting To a Kerberos Secured Hadoop

Understanding how it Works (Optional)

Generating the CREATE CONNECTION Statement

Creating the connection

Using the Connection When Creating a Virtual Schema

Files

impala.md

Latest commit

History

impala.md

File metadata and controls

Impala SQL Dialect

Registering the JDBC Driver in EXAOperation

Uploading the JDBC Driver to EXAOperation

Installing the Adapter Script

Defining a Named Connection

Creating a Virtual Schema

Connecting To a Kerberos Secured Hadoop

Understanding how it Works (Optional)

Generating the CREATE CONNECTION Statement

Creating the connection

Using the Connection When Creating a Virtual Schema