مرکز منطقه ای اطلاع رساني علوم و فناوري - Benchmarking attribute cardinality maps for database systems using the TPC-D specifications

Abstract :

Benchmarking is an important phase in developing any new software technique because it helps to validate the underlying theory in the specific problem domain. But benchmarking of new software strategies is a very complex problem, because it is difficult (if not impossible) to test, validate and verify the results of the various schemes in completely different settings. This is even more true in the case of database systems because the benchmarking also depends on the types of queries presented to the databases used in the benchmarking experiments. Query optimization strategies in relational database systems rely on approximately estimating the query result sizes to minimize the response time for user-queries. Among the many query result size estimation techniques, the histogram-based techniques are by far the most commonly used ones in modern-day database systems. These techniques estimate the query result sizes by approximating the underlying data distributions, and, thus, are prone to estimation errors. In two recent works , we proposed (and thoroughly analyzed) two new forms of histogram-like techniques called the rectangular and trapezoidal attribute cardinality maps (ACM), respectively, that give much smaller estimation errors than the traditional equi-width and equi-depth histograms currently being used by many commercial database systems. This paper reports how the benchmarking of the Rectangular-ACM (R-ACM) and the Trapezoidal-ACM (T-ACM) for query optimization can be achieved. By conducting an extensive set of experiments using the acclaimed TPC-D benchmark queries and database , we demonstrate that these new ACM schemes are much more accurate than the traditional histograms for query result size estimation. Apart from demonstrating the power of the ACMs, this paper also shows how the TPC-D benchmarking can be achieved using a large synthetic database with many different patterns of synthetic queries, which are representative of a real-world business environment.

Keywords :

benchmark testing; query processing; relational databases; software engineering; TPC-D benchmark queries; attribute cardinality maps benchmarking; benchmarking; histogram-like techniques; query optimization; relational database systems; software technique; Benchmark testing; Business; Councils; Database systems; Delay; Estimation error; Histograms; Query processing; Relational databases; Software testing;