Skip to content

Latest commit

 

History

History
62 lines (49 loc) · 2.81 KB

perf.md

File metadata and controls

62 lines (49 loc) · 2.81 KB

Hive JDBC Performance Testing Tool (perf)

JDBC Performance Testing tool. Will provide connection timing details and rolling windows of performance for long running queries. Details in the windows will show not only records but also an estimate of the data volume.

Example Output

========== v.2.0.1-SNAPSHOT ===========
URL        : jdbc:hive2://os04.streever.local:2181,os05.streever.local:2181,os10.streever.local:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/[email protected]
Batch Size : 10000
SQL        : SELECT field1_1,field1_2,field1_3,field1_4 FROM perf_test.wide_table
Lite       : false
----------------------------
Connect Attempt  : 0ms
Connected        : 2201ms
Create Statement : 2205ms
Before Query     : 2205ms
Query Return     : 2697ms
Start Iterating Results   : 2698ms
Completed Iterating Results: 79408ms
Statement Closed           : 79452ms
Resultset Closed           : 79452ms
Process Completed          : 79471ms

----------------------------
Window Length(ms) | Record Average | Records per/sec | Data Size per/sec
60000		7710000		128500		14970260
180000		10020000		125250		14642405
300000		10020000		125250		14642405
600000		10020000		125250		14642405

===========================
Running for: 80966ms		Started: 2020-03-06 13:57:40.492		Record Count: 10020000		Data Size: 1171392406

Environment and Connection via Knox

Example Note: The additional cp setting with hadoop classpath is required when connecting to a Kerberized endpoint.

URL="jdbc:hive2://os06.streever.local:8443/;ssl=true;sslTrustStore=/home/dstreev/certs/bm90-gateway.jks;trustStorePassword=hortonworks;transportMode=http;httpPath=gateway/default/hive"
QUERY="SELECT field1_1,field1_2,field1_3,field1_4 FROM perf_test.wide_table"
BATCH_SIZE=10000
PW=<set_me>

hive-sre perf -u "${URL}" -e "${QUERY}" -b $BATCH_SIZE -n ${USER} -p <password> 

Environment and Connection via Kerberos from Edge

Example Note: Additional hadoop libraries are required for a kerberized connection. Use --hadoop-classpath in the commandline to call the environments hadoop classpath and add it to the cp of the application.

URL="jdbc:hive2://os05.streever.local:10601/default;httpPath=cliservice;principal=hive/[email protected];transportMode=http"
QUERY="SELECT field1_1,field1_2,field1_3,field1_4 FROM perf_test.wide_table"
# Note that `hadoop classpath` statement to bring in all necessary libs.
BATCH_SIZE=10000

hive-sre --hadoop-classpath perf -u "${URL}" -e "${QUERY}" -b $BATCH_SIZE 

Environment and Connection via Kerberos from a Client Host (Non-Edge)

Even with a valid Kerberos ticket, this type of host will not have all the hadoop libs we get from hadoop classpath to work. I have not yet been able to find the right mix of classes to add to the 'uber' jar to get this working.