• DocumentCode
    1345684
  • Title

    Scalable parallel data mining for association rules

  • Author

    Han, Eui-Hong Sam ; Karypis, George ; Kumar, Vipin

  • Author_Institution
    Army HPC Res. Center, Minnesota Univ., Minneapolis, MN, USA
  • Volume
    12
  • Issue
    3
  • fYear
    2000
  • Firstpage
    337
  • Lastpage
    352
  • Abstract
    The authors propose two new parallel formulations of the Apriori algorithm (R. Agrawal and R. Srikant, 1994) that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size
  • Keywords
    associative processing; data mining; parallel algorithms; transaction processing; very large databases; 128-processor Cray T3E; Apriori algorithm; CD algorithm; DD algorithm; HD algorithm; IDD algorithm; association rules; candidate set size; hash tree; hybrid algorithm; parallel formulations; scalable parallel data mining; Association rules; Concurrent computing; Costs; Data mining; High definition video; Intelligent structures; Parallel processing; Partitioning algorithms; Scalability; Transaction databases;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.846289
  • Filename
    846289