Skip to content

Latest commit

 

History

History
105 lines (74 loc) · 3.33 KB

4-data-investigation.md

File metadata and controls

105 lines (74 loc) · 3.33 KB

𝌭️ Data Modelling and Investigation

Superset, Trino, Hive, S3 example usage

Create a simple dataset from s3 using trino and hive, discovery with superset

  1. Login as your USER_NAME using the OpenShift v4 button and FreeIPA identity provider to Superset using the url.

    oc login --server=https://api.${CLUSTER_DOMAIN##apps.}:6443 -u <USER_NAME> -p <PASSWORD>
    echo -e https://$(oc get route superset --template='{{ .spec.host }}' -n ${PROJECT_NAME})

    4-superset-login-empty

  2. Browse to Data > Databases

  3. Select +Database button

    4-superset-browse-database

  4. Choose Trino and supply the SQLAlchemy URI we connect using our ldap user credentials.

    trino://${USER_NAME}:${PASSWORD}@trino-service:8443

    4-superset-url

  5. Add the trino SSL CA certificate into the Advanced > Security > ROOT CERTIFICATE section

    cat /projects/rainforest/supply-chain/trino/trino-certs/ca.crt

    4-superset-ca-cert

  6. Update the Advanced > SQL Lab and select these tick boxes:

    Expose database in SQL Lab
    Allow CREATE TABLE AS
    Allow DML
    Allow Multi Schema Metadata Fetch
    Enable query cost estimation
    Allow this database to be explored

    4-superset-sqllab-settings

  7. Select Connect, if settings correct the connection will be created. If it fails, double check the SSL steps for trino trustore in the Secrets section were completed OK.

    4-superet-connect-ok

  8. Browse to SQLLab > SLQEditor

  9. Create a hive catalog from our wine_quality.csv data stored in S3 (make sure you copy the csv fle into s3 from the Spark Exercise)

    CREATE TABLE demo.default.wine_quality
    (
    "fixed acidity" VARCHAR,
    "volatile acidity" VARCHAR,
    "citric acid" VARCHAR,
    "residual sugar" VARCHAR,
    "chlorides" VARCHAR,
    "free sulfur dioxide" VARCHAR,
    "total sulfur dioxide" VARCHAR,
    "density" VARCHAR,
    "pH" VARCHAR,
    "sulphates" VARCHAR,
    "alcohol" VARCHAR,
    "quality" VARCHAR
    )
    WITH (FORMAT = 'CSV',
    skip_header_line_count = 1,
    EXTERNAL_LOCATION = 's3a://data/'
    )
    

    4-superset-hive-s3-schema

  10. Select the + in the SQLEditor to create a new query. This will run using trino and you should see rows returned.

select * from demo.default.wine_quality;

4-superset-sql-select-all

  1. We can see the Finished queries that ran when logged into the trino ui and drill down into them.
echo -e https://$(oc get route trino --template='{{ .spec.host }}' -n ${PROJECT_NAME})

4-trino-finished

  1. In superset SQLEditor select CREATE CHART button from our select statement. Create different charts e.g. Bar Chart, Heat Graph, TreeMap etc.

4-superset-graph-treemap

  1. And publish these to the Dashboard.

4-superset-dashboard