• DocumentCode
    3570870
  • Title

    SMOPD-C: An autonomous vertical partitioning technique for distributed databases on cluster computers

  • Author

    Liangzhe Li ; Gruenwald, Le

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Oklahoma, Norman, OK, USA
  • fYear
    2014
  • Firstpage
    171
  • Lastpage
    178
  • Abstract
    Distributed databases on cluster computers are widely used in many applications. With the volume of data getting bigger and bigger and the velocity of data getting faster and faster, it is important to develop techniques that can improve query response time to meet applications´ needs. Database vertical partitioning that splits a database table into smaller tables containing fewer attributes in order to reduce disk I/Os is one of those techniques. While many algorithms have been developed for database vertical partitioning, none of them is designed to partition the database stored in cluster computers dynamically, i.e., without human interference and without fixed query workloads. To fill this gap, this paper introduces a dynamic algorithm, SMOPD-C, that can autonomously partition a distributed database vertically on cluster computers, determine when a database re-partitioning is needed, and re-partition the database accordingly. The paper then presents comprehensive experiments that were conducted to study the performance of SMOPD-C using the TPC-H benchmark on a cluster computer. The experiment results show that SMOPD-C is capable of performing database re-partitioning dynamically with high accuracy to provide better query cost than the current partitioning configuration.
  • Keywords
    distributed databases; input-output programs; query processing; software performance evaluation; workstation clusters; SMOPD-C; TPC-H benchmark; autonomous vertical partitioning technique; cluster computers; data velocity; data volume; database repartitioning; database table; database vertical partitioning; disk I-O reduction; distributed databases; dynamic algorithm; query cost; query response time improvement; Algorithm design and analysis; Clustering algorithms; Computers; Distributed databases; Heuristic algorithms; Partitioning algorithms; cluster computer; physical read mainly queries; query optimizer; vertical partitioning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on
  • Type

    conf

  • DOI
    10.1109/IRI.2014.7051887
  • Filename
    7051887