• DocumentCode
    3215108
  • Title

    Parallel implementation of WAP-tree mining algorithm

  • Author

    Wu, Ming ; Chung, Moon Jung ; Moonesinghe, H.D.K.

  • Author_Institution
    Dept. of CSE, Michigan State Univ., USA
  • fYear
    2004
  • fDate
    7-9 July 2004
  • Firstpage
    135
  • Lastpage
    142
  • Abstract
    In this paper, we present parallel algorithms for Web log mining and the performance prediction model. The algorithm, based on WAP-tree, scans dataset only twice and avoids candidate generation process. We parallelized mining part of WAP tree. To balance the workload among processors, we developed a task scheduling strategy. A performance model of parallel Web mining algorithm is also developed to predict the performance of parallel implementation. This model shows that we can get linear speedup for a small number of processors, and a slow down of speedup as the number of processors increases. Using the performance model, we can also estimate the maximum speed up. We implemented the algorithm on a Pittsburg Super Computer Center Lemieux using up to 48 processors. Our benchmark results showed that the performance model correctly predicts the performance of the parallel implementation. We have achieved a good speedup as the size of the dataset is increased.
  • Keywords
    Internet; data mining; parallel algorithms; performance evaluation; processor scheduling; protocols; resource allocation; Pittsburg Super Computer Center Lemieux; WAP-tree mining; Web log mining; candidate generation process; dataset scanning; linear processing speedup; maximum speed up; parallel Web mining; parallel algorithm; performance prediction model; task scheduling; workload balancing; Costs; Data mining; Explosives; Information analysis; Moon; Parallel algorithms; Predictive models; Processor scheduling; Web mining; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on
  • ISSN
    1521-9097
  • Print_ISBN
    0-7695-2152-5
  • Type

    conf

  • DOI
    10.1109/ICPADS.2004.1316089
  • Filename
    1316089