Create a simple dataset from s3 using trino and hive, discovery with superset
-
Login as your USER_NAME using the OpenShift v4 button and FreeIPA identity provider to Superset using the url.
oc login --server=https://api.${CLUSTER_DOMAIN##apps.}:6443 -u <USER_NAME> -p <PASSWORD>
echo -e https://$(oc get route superset --template='{{ .spec.host }}' -n ${PROJECT_NAME})
-
Browse to Data > Databases
-
Select +Database button
-
Choose Trino and supply the SQLAlchemy URI we connect using our ldap user credentials.
trino://${USER_NAME}:${PASSWORD}@trino-service:8443
-
Add the trino SSL CA certificate into the Advanced > Security > ROOT CERTIFICATE section
cat /projects/rainforest/supply-chain/trino/trino-certs/ca.crt
-
Update the Advanced > SQL Lab and select these tick boxes:
Expose database in SQL Lab Allow CREATE TABLE AS Allow DML Allow Multi Schema Metadata Fetch Enable query cost estimation Allow this database to be explored
-
Select Connect, if settings correct the connection will be created. If it fails, double check the SSL steps for trino trustore in the Secrets section were completed OK.
-
Browse to SQLLab > SLQEditor
-
Create a hive catalog from our wine_quality.csv data stored in S3 (make sure you copy the csv fle into s3 from the Spark Exercise)
CREATE TABLE demo.default.wine_quality ( "fixed acidity" VARCHAR, "volatile acidity" VARCHAR, "citric acid" VARCHAR, "residual sugar" VARCHAR, "chlorides" VARCHAR, "free sulfur dioxide" VARCHAR, "total sulfur dioxide" VARCHAR, "density" VARCHAR, "pH" VARCHAR, "sulphates" VARCHAR, "alcohol" VARCHAR, "quality" VARCHAR ) WITH (FORMAT = 'CSV', skip_header_line_count = 1, EXTERNAL_LOCATION = 's3a://data/' )
-
Select the + in the SQLEditor to create a new query. This will run using trino and you should see rows returned.
select * from demo.default.wine_quality;
- We can see the Finished queries that ran when logged into the trino ui and drill down into them.
echo -e https://$(oc get route trino --template='{{ .spec.host }}' -n ${PROJECT_NAME})
- In superset SQLEditor select CREATE CHART button from our select statement. Create different charts e.g. Bar Chart, Heat Graph, TreeMap etc.
- And publish these to the Dashboard.