• DocumentCode
    1920957
  • Title

    Pre-clustering based sequential pattern mining

  • Author

    Wu, Shaochun ; Wu, Gengfeng ; Jin, Shenjie

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Shanghai Univ., China
  • fYear
    2004
  • fDate
    14-16 Sept. 2004
  • Firstpage
    1008
  • Lastpage
    1013
  • Abstract
    Sequential pattern mining is increasingly becoming useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of candidate patterns demand efficient and scalable algorithms. In this paper, we present an efficient parallel algorithm named pre-clustering based sequential pattern mining (PCSPM). The algorithm groups sequence data into some clusters according to a similarity definition, and then distribute the clusters to the nodes of distributed memory parallel computer and form some node sets according to the clusters. By limiting the most of communication in each node set, it can greatly reduce the unnecessary communications among parallel computing nodes, and therefore, save much time of communication. The experimental results and the relevant analysis show that PCSPM algorithm is efficient and available.
  • Keywords
    data mining; parallel algorithms; pattern clustering; sequences; PCSPM algorithm; candidate patterns; data distribution; data mining; distributed memory parallel computer; parallel algorithm; parallel computing nodes; preclustering based sequential pattern mining; sequence data; similarity definition; Algorithm design and analysis; Clustering algorithms; Concurrent computing; Distributed computing; Parallel algorithms; Parallel processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2004. CIT '04. The Fourth International Conference on
  • Print_ISBN
    0-7695-2216-5
  • Type

    conf

  • DOI
    10.1109/CIT.2004.1357328
  • Filename
    1357328