-
Notifications
You must be signed in to change notification settings - Fork 981
Test Data
Paul Rogers edited this page Feb 25, 2017
·
3 revisions
When developing Drill it is handy to have a variety of test data available. Below is a partial list of such resources.
- Sample Datasets from Drill documentation.
Drill includes the TPC-H data and queries. Scan the specification for details, especially the ER diagram on page 11 (reproduced below.)
(insert image)
Compared to the TPC-H schema, Drill adds a column prefix of the from "x_":
- customer: c_
- orders: o_
- lineitems: l_
Other notes:
-
TestTpchDistributedConcurrent
tests a variety of TPC-H queries. Look at it for links to the queries and data. - Queries are in
drill-java-exec/src/test/resources/queries/tpch
. - Data is available to Drill in
cp.`tpch/something.parquet`
- Data is packaged in
tpch-sample-data-x.y.z.jar
- Data is also available on the class path in the folder:
contrib/data/tpch-sample-data/target/classes/tpch
As reported from parquet-tools schema
:
customer.parquet
message root {
required int32 c_custkey;
required binary c_name (UTF8);
required binary c_address (UTF8);
required int32 c_nationkey;
required binary c_phone (UTF8);
required double c_acctbal;
required binary c_mktsegment (UTF8);
required binary c_comment (UTF8);
}
lineitem.parquet
message root {
required int32 l_orderkey;
required int32 l_partkey;
required int32 l_suppkey;
required int32 l_linenumber;
required double l_quantity;
required double l_extendedprice;
required double l_discount;
required double l_tax;
required binary l_returnflag (UTF8);
required binary l_linestatus (UTF8);
required int32 l_shipdate (DATE);
required int32 l_commitdate (DATE);
required int32 l_receiptdate (DATE);
required binary l_shipinstruct (UTF8);
required binary l_shipmode (UTF8);
required binary l_comment (UTF8);
}
nation.parquet
message root {
required int32 n_nationkey;
required binary n_name (UTF8);
required int32 n_regionkey;
required binary n_comment (UTF8);
}
orders.parquet
message root {
required int32 o_orderkey;
required int32 o_custkey;
required binary o_orderstatus (UTF8);
required double o_totalprice;
required int32 o_orderdate (DATE);
required binary o_orderpriority (UTF8);
required binary o_clerk (UTF8);
required int32 o_shippriority;
required binary o_comment (UTF8);
}
part.parquet
message root {
required int32 p_partkey;
required binary p_name (UTF8);
required binary p_mfgr (UTF8);
required binary p_brand (UTF8);
required binary p_type (UTF8);
required int32 p_size;
required binary p_container (UTF8);
required double p_retailprice;
required binary p_comment (UTF8);
}
partsupp.parquet
message root {
required int32 ps_partkey;
required int32 ps_suppkey;
required int32 ps_availqty;
required double ps_supplycost;
required binary ps_comment (UTF8);
}
region.parquet
message root {
required int32 r_regionkey;
required binary r_name (UTF8);
required binary r_comment (UTF8);
}
supplier.parquet
message root {
required int32 s_suppkey;
required binary s_name (UTF8);
required binary s_address (UTF8);
required int32 s_nationkey;
required binary s_phone (UTF8);
required double s_acctbal;
required binary s_comment (UTF8);
}
Drill ships the FoodMart data set maintained by Julian Hyde, adapted from the original Microsoft version.