DocumentCode :
1791731
Title :
Parallel and quantitative sequential pattern mining for large-scale interval-based temporal data
Author :
Guangchen Ruan ; Hui Zhang ; Plale, Beth
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
32
Lastpage :
39
Abstract :
Mining frequent subsequences of patterns, or sequential pattern mining, has wide application in customer shopping sequence analysis, web log stream analysis, multi-modal behavioral studies, to name a few. To detect unknown, anomalous, and unexpected patterns from large-scale interval-based temporal data without complete a priori knowledge is challenging. In this paper, we present a framework - PESMiner which allows parallel and quantitative mining of sequential patterns at scale. Whereas most existing sequential mining algorithms can only find sequential orders of temporal events, our work presents a novel interactive temporal data mining algorithm capable of extracting precise temporal properties of sequential patterns. Furthermore, our work provides a unified parallel solution that scales our algorithms to larger temporal data sets by exploiting iterative MapReduce tasks. Comprehensive performance evaluations demonstrate that PESMiner significantly outperforms existing interval-based mining algorithms in terms of both quality (i.e. accuracy, precision, and recall) and scalability.
Keywords :
Internet; data mining; parallel processing; PESMiner; Web log stream analysis; customer shopping sequence analysis; interactive temporal data mining algorithm; iterative MapReduce tasks; large-scale interval-based temporal data; parallel sequential pattern mining; quantitative sequential pattern mining; unified parallel solution; Algorithm design and analysis; Clustering algorithms; Data mining; Educational institutions; Pattern matching; Prototypes; Web services; interval-based temporal data; iterative MapReduce; quantitative sequential pattern mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004410
Filename :
7004410
Link To Document :
بازگشت