• DocumentCode
    2264504
  • Title

    Parallel leap: large-scale maximal pattern mining in a distributed environment

  • Author

    El-Hajj, Mohammad ; Zaïane, Osmar R.

  • Author_Institution
    Dept. of Comput. Sci., Alberta Univ., Edmonton, Alta.
  • Volume
    1
  • fYear
    0
  • fDate
    0-0 0
  • Abstract
    When computationally feasible, mining extremely large databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have been suggested such as sequential mining for maximal patterns or searching for all frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets. In this work we propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of maximal patterns which, to the best of our knowledge, has never been proposed efficiently before. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than 2 hours
  • Keywords
    data mining; parallel processing; pattern clustering; very large databases; database frequent pattern; database mining; distributed environment; large dataset mining; maximal pattern mining; parallel mining; processor cluster; search space; Association rules; Data mining; Image databases; Itemsets; Large-scale systems; Pattern analysis; Radiofrequency identification; Satellite broadcasting; Surveillance; Transaction databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    1521-9097
  • Print_ISBN
    0-7695-2612-8
  • Type

    conf

  • DOI
    10.1109/ICPADS.2006.77
  • Filename
    1655657