• DocumentCode
    785514
  • Title

    A genetic algorithm-based clustering approach for database partitioning

  • Author

    Cheng, Chun-Hung ; Lee, Wing-Kin ; Wong, Kam-Fai

  • Author_Institution
    Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China
  • Volume
    32
  • Issue
    3
  • fYear
    2002
  • fDate
    8/1/2002 12:00:00 AM
  • Firstpage
    215
  • Lastpage
    230
  • Abstract
    In a typical distributed/parallel database system, a request mostly accesses a subset of the entire database. It is, therefore, natural to organize commonly accessed data together and to place them on nearby, preferably the same, machine(s)/site(s). For this reason, data partitioning and data allocation are performance critical issues in distributed database application design. We are dealing with data partitioning. Data partitioning requires the use of clustering. Although many clustering algorithms have been proposed, their performance has not been extensively studied. Moreover, the special problem structure in clustering is rarely exploited. We explore the use of a genetic search-based clustering algorithm for data partitioning to achieve high database retrieval performance. By formulating the underlying problem as a traveling salesman problem (TSP), we can take advantage of this particular structure. Three new operators for GAs are also proposed and experimental results indicate that they outperform other operators in solving the TSP. The proposed GA is applied to solve the data-partitioning problem. Our computational study shows that our GA performs well for this application.
  • Keywords
    data handling; data structures; distributed databases; genetic algorithms; parallel databases; search problems; travelling salesman problems; data allocation; data partitioning; database partitioning; database retrieval performance; distributed database; experimental results; genetic algorithm-based clustering approach; genetic search-based clustering algorithm; optimization; parallel database; traveling salesman problem; Application software; Clustering algorithms; Database systems; Distributed databases; Genetics; Information retrieval; Partitioning algorithms; Relational databases; Spatial databases; Transaction databases;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1094-6977
  • Type

    jour

  • DOI
    10.1109/TSMCC.2002.804444
  • Filename
    1097734