• DocumentCode
    1877233
  • Title

    SCRAP: A Statistical Approach for Creating a Database Query Workload Based on Performance Bottlenecks

  • Author

    Skarie, James ; Debnath, Biplob K. ; Lilja, David J. ; Mokbel, Mohamed F.

  • Author_Institution
    Electrical and Computer Engineering Department, University of Minnesota, USA. skar0059@umn.edu
  • fYear
    2007
  • fDate
    27-29 Sept. 2007
  • Firstpage
    183
  • Lastpage
    192
  • Abstract
    With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.
  • Keywords
    Buildings; Computer science; Costs; Data engineering; Database systems; Indexes; Internet; Runtime; System performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Workload Characterization, 2007. IISWC 2007. IEEE 10th International Symposium on
  • Conference_Location
    Boston, MA, USA
  • Print_ISBN
    978-1-4244-1561-8
  • Electronic_ISBN
    978-1-4244-1562-5
  • Type

    conf

  • DOI
    10.1109/IISWC.2007.4362194
  • Filename
    4362194