Title :
P-Mine: Parallel itemset mining on large datasets
Author :
Baralis, Elena ; Cerquitelli, Tania ; Chiusano, Silvia ; Grand, Anais
Author_Institution :
Dipt. di Autom. e Inf., Politec. di Torino, Turin, Italy
Abstract :
Itemset mining is a well-known exploratory technique used to discover interesting correlations hidden in a data collection. Since ever increasing amounts of data are being collected and stored (e.g., business transactions, medical and biological data, context-aware applications), scalable and efficient approaches are needed to analyzing these large data collections. This paper proposes a parallel disk-based approach to efficiently supporting frequent itemset mining on a multi-core processor. Our parallel strategy is presented in the context of the VLDB-Mine persistent data structure. Different techniques have been proposed to optimize both data- and compute-intensive aspects of the mining algorithm. Preliminary experiments, performed on both real and synthetic datasets, show promising results in improving the efficiency and scalability of the mining activity on large datasets.
Keywords :
data mining; data structures; multiprocessing systems; parallel processing; P-Mine; VLDB- Mine persistent data structure; data collection; frequent itemset mining; large datasets; multicore processor; parallel disk-based approach; parallel itemset mining; parallel strategy; real datasets; synthetic datasets; Data mining; Data structures; Itemsets; Multicore processing; Prefetching; Scalability;
Conference_Titel :
Data Engineering Workshops (ICDEW), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-5303-8
Electronic_ISBN :
978-1-4673-5302-1
DOI :
10.1109/ICDEW.2013.6547461