• DocumentCode
    1436743
  • Title

    Parallel Frequent Item Set Mining with Selective Item Replication

  • Author

    Özkural, Eray ; Uçar, Bora ; Aykanat, Cevdet

  • Author_Institution
    Dept. of Comput. Eng., Bilkent Univ., Ankara, Turkey
  • Volume
    22
  • Issue
    10
  • fYear
    2011
  • Firstpage
    1632
  • Lastpage
    1640
  • Abstract
    We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication, which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable speedups on a system with small-to-medium number of processors for synthetic and real-world databases.
  • Keywords
    data mining; database management systems; graph theory; resource allocation; NoClique; NoClique2; computational load balancing; graph partitioning; parallel frequent item set mining; selective item replication; subdatabases; transaction database distribution scheme; vertex separator; Data mining; Equations; Itemsets; Particle separators; Program processors; Parallel data mining; frequent item set mining; graph partitioning by vertex separator.; mining methods and algorithms; selective data replication;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2011.32
  • Filename
    5703072