• DocumentCode
    659593
  • Title

    Frequent Itemset Mining for Big Data

  • Author

    Moens, Sandy ; Aksehirli, Emin ; Goethals, Bart

  • Author_Institution
    Univ. Antwerpen, Antwerp, Belgium
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    111
  • Lastpage
    118
  • Abstract
    Frequent Itemset Mining (FIM) is one of the most well known techniques to extract knowledge from data. The combinatorial explosion of FIM methods become even more problematic when they are applied to Big Data. Fortunately, recent improvements in the field of parallel programming already provide good tools to tackle this problem. However, these tools come with their own technical challenges, e.g. balanced data distribution and inter-communication costs. In this paper, we investigate the applicability of FIM techniques on the MapReduce platform. We introduce two new methods for mining large datasets: Dist-Eclat focuses on speed while BigFIM is optimized to run on really large datasets. In our experiments we show the scalability of our methods.
  • Keywords
    Big Data; data mining; parallel programming; Big Data; BigFIM; Dist-Eclat; MapReduce platform; frequent itemset mining; knowledge extraction; parallel programming; Data handling; Data mining; Data storage systems; Information management; Itemsets; Partitioning algorithms; distributed data mining; eclat; hadoop; mapreduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691742
  • Filename
    6691742