• DocumentCode
    3165626
  • Title

    Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures

  • Author

    Lucchese, Claudio ; Orlando, Salvatore ; Perego, Raffaele

  • Author_Institution
    Ca´´ Foscari Univ. Venice, Venice
  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    242
  • Lastpage
    251
  • Abstract
    Inspired by emerging multi-core computer architectures, in this paper we present MT_CLOSED, a multi-threaded algorithm for frequent closed itemset mining (FCIM). To the best of our knowledge, this is the first FCIM parallel algorithm proposed so far. We studied how different duplicate checking techniques, typical of FCIM algorithms, may affect this parallelization. We showed that only one of them allows to decompose the global FCIM problem into independent tasks that can be executed in any order, and thus in parallel. Finally we show how MT_Closed efficiently harness modern CPUs. We designed and tested several parallelization paradigms by investigating static/dynamic decomposition and scheduling of tasks, thus showing its scalability w.r.t. to the number of CPUs. We analyzed the cache friendliness of the algorithm. Finally, we provided additional speed-up by introducing SIMD extensions.
  • Keywords
    computer architecture; data mining; multi-threading; parallel algorithms; scheduling; MT_CLOSED; central processing unit; duplicate checking; frequent closed itemset mining; frequent closed pattern; multicore computer architecture; multithreaded algorithm; parallel algorithm; parallel mining; parallelization; task scheduling; Algorithm design and analysis; Computer architecture; Coprocessors; Data mining; Data structures; Dynamic scheduling; Itemsets; Parallel algorithms; Parallel processing; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
  • Conference_Location
    Omaha, NE
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3018-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2007.13
  • Filename
    4470248