Skip to content

Latest commit

 

History

History
136 lines (95 loc) · 5.43 KB

impala.md

File metadata and controls

136 lines (95 loc) · 5.43 KB

Impala SQL Dialect

Impala is a MPP (Massive Parallel Processing) SQL query engine for processing data that is stored on a Hadoop cluster.

Registering the JDBC Driver in EXAOperation

First download the Impala JDBC driver.

Now register the driver in EXAOperation:

  1. Click "Software"
  2. Switch to tab "JDBC Drivers"
  3. Click "Browse..."
  4. Select JDBC driver file
  5. Click "Upload"
  6. Click "Add"
  7. In dialog "Add EXACluster JDBC driver" configure the JDBC driver (see below)

You need to specify the following settings when adding the JDBC driver via EXAOperation.

Parameter Value
Name IMPALA
Main com.cloudera.impala.jdbc41.Driver
Prefix jdbc:impala:
Files ImpalaJDBC41.jar

Uploading the JDBC Driver to EXAOperation

  1. Create a bucket in BucketFS
  2. Upload the driver to BucketFS

This step is necessary since the UDF container the adapter runs in has no access to the JDBC drivers installed via EXAOperation but it can access BucketFS.

Installing the Adapter Script

Upload the latest available release of Virtual Schema JDBC Adapter to Bucket FS.

Then create a schema to hold the adapter script.

CREATE SCHEMA ADAPTER;

The SQL statement below creates the adapter script, defines the Java class that serves as entry point and tells the UDF framework where to find the libraries (JAR files) for Virtual Schema and database driver.

CREATE OR REPLACE JAVA ADAPTER SCRIPT ADAPTER.JDBC_ADAPTER AS
  %scriptclass com.exasol.adapter.RequestDispatcher;
  %jar /buckets/<BFS service>/<bucket>/virtualschema-jdbc-adapter-dist-1.19.2.jar;
  %jar /buckets/<BFS service>/<bucket>/ImpalaJDBC41.jar;
/
;

Defining a Named Connection

Define the connection to Impala as shown below.

CREATE OR REPLACE CONNECTION IMPALA_CONNECTION 
TO 'jdbc:impala://<Impala host>:<port>' 
USER '<user>' 
IDENTIFIED BY '<password>';

Creating a Virtual Schema

Below you see how an Impala Virtual Schema is created. Please note that you have to provide the name of a schema.

CREATE VIRTUAL SCHEMA <virtual schema name>
    USING ADAPTER.JDBC_ADAPTER 
    WITH
    SQL_DIALECT     = 'IMPALA'
    CONNECTION_NAME = 'IMPALA_CONNECTION'
    SCHEMA_NAME     = '<schema name>';

Connecting To a Kerberos Secured Hadoop

Connecting to a Kerberos secured Impala service only differs in one aspect: You have a CONNECTION object which contains all the relevant information for the Kerberos authentication.

Understanding how it Works (Optional)

Both the adapter script and the internally used IMPORT FROM JDBC statement support Kerberos authentication. They detect, that the connection is a Kerberos connection by a special prefix in the IDENTIFIED BY field. In such case, the authentication will happen using a Kerberos keytab and Kerberos config file (using the JAAS Java API).

The CONNECTION object stores all relevant information and files in its fields:

  • The TO field contains the JDBC connection string
  • The USER field contains the Kerberos principal
  • The IDENTIFIED BY field contains the Kerberos configuration file and keytab file (base64 encoded) along with an internal prefix ExaAuthType=Kerberos; to identify the CONNECTION as a Kerberos CONNECTION.

Generating the CREATE CONNECTION Statement

In order to simplify the creation of Kerberos CONNECTION objects, the create_kerberos_conn.py Python script has been provided. The script requires 5 arguments:

  • CONNECTION name (arbitrary name for the new CONNECTION)
  • Kerberos principal for Hadoop (i.e., Hadoop user)
  • Kerberos configuration file path (e.g., krb5.conf)
  • Kerberos keytab file path, which contains keys for the Kerberos principal
  • JDBC connection string

Example command:

python tools/create_kerberos_conn.py krb_conn [email protected] /etc/krb5.conf ./krbuser.keytab \
  'jdbc:impala://<Impala host>:<port>;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=host.example.com;KrbServiceName=impala'

Output:

CREATE CONNECTION krb_conn TO 'jdbc:impala://<Impala host>:<port>;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=host.example.com;KrbServiceName=impala' USER '[email protected]' IDENTIFIED BY 'ExaAuthType=Kerberos;enp6Cg==;YWFhCg=='

Creating the connection

You have to execute the generated CREATE CONNECTION statement directly in EXASOL to actually create the Kerberos CONNECTION object. For more detailed information about the script, use the help option:

python tools/create_kerberos_conn.py -h

Using the Connection When Creating a Virtual Schema

You can now create a virtual schema using the Kerberos connection created before.

CREATE VIRTUAL SCHEMA <virtual schema name> 
   USING ADAPTER.JDBC_ADAPTER
   WITH
   SQL_DIALECT     = 'IMPALA'
   CONNECTION_NAME = 'KRB_CONN'
   SCHEMA_NAME     = '<schema name>';