TPC-H
The TPC-H Benchmark sample data
Last updated
The TPC-H Benchmark sample data
Last updated
As described in the TPC Benchmarkβ’ H (TPC-H) specification:
βTPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.β
TPC-H comes with various data set sizes to test different scaling factors. Spice.ai TPC-H dataset consists of the base scale factor (several million elements) and is available under tpch
schema . The columns and their schema available for each table can be viewed with the describe
command:
The components of TPC-H consist of eight separate and individual tables (the Base Tables). The relationships between columns in these tables are illustrated in the following ER diagram:
TPC-H comes with pre-defined queries to ask different business questions. Query below reports the amount of business that was billed, shipped, and returned. More information about TPC-H and all the queries involved can be found in the official TPC Benchmark H Standard Specification.
The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in ascending order of RETURNFLAG and LINESTATUS.