• DocumentCode
    1664982
  • Title

    PaWI: Parallel Weighted Itemset Mining by Means of MapReduce

  • Author

    Baralis, Elena ; Cagliero, Luca ; Garza, Paolo ; Grimaudo, Luigi

  • Author_Institution
    Dipt. di Autom. e Inf., Politec. di Torino, Turin, Italy
  • fYear
    2015
  • Firstpage
    25
  • Lastpage
    32
  • Abstract
    Frequent item set mining is an exploratory data mining technique that has fruitfully been exploited to extract recurrent co-occurrences between data items. Since in many application contexts items are enriched with weights denoting their relative importance in the analyzed data, pushing item weights into the item set mining process, i.e., Mining weighted item sets rather than traditional item sets, is an appealing research direction. Although many efficient in-memory weighted item set mining algorithms are available in literature, there is a lack of parallel and distributed solutions which are able to scale towards Big Weighted Data. This paper presents a scalable frequent weighted item set mining algorithm based on the MapReduce paradigm. To demonstrate its action ability and scalability, the proposed algorithm was tested on a real Big dataset collecting approximately 34 millions of reviews of Amazon items. Weights indicate the ratings given by users to the purchased items. The mined item sets represent combinations of items that were frequently bought together with an overall rating above average.
  • Keywords
    Big Data; data mining; parallel processing; Big Data; MapReduce; PaWI; data mining technique; frequent item set mining; parallel weighted item set mining; Algorithm design and analysis; Big data; Clustering algorithms; Data mining; Itemsets; Scalability; Weight measurement; Data mining; H.2.8.b Clustering; and association rules; classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2015 IEEE International Congress on
  • Conference_Location
    New York, NY
  • Print_ISBN
    978-1-4673-7277-0
  • Type

    conf

  • DOI
    10.1109/BigDataCongress.2015.14
  • Filename
    7207198