Abstract :
Mining frequent subsequences of patterns, or sequential pattern mining, has wide application in customer shopping sequence analysis, web log stream analysis, multi-modal behavioral studies, to name a few. To detect unknown, anomalous, and unexpected patterns from large-scale interval-based temporal data without complete a priori knowledge is challenging. In this paper, we present a framework - PESMiner which allows parallel and quantitative mining of sequential patterns at scale. Whereas most existing sequential mining algorithms can only find sequential orders of temporal events, our work presents a novel interactive temporal data mining algorithm capable of extracting precise temporal properties of sequential patterns. Furthermore, our work provides a unified parallel solution that scales our algorithms to larger temporal data sets by exploiting iterative MapReduce tasks. Comprehensive performance evaluations demonstrate that PESMiner significantly outperforms existing interval-based mining algorithms in terms of both quality (i.e. accuracy, precision, and recall) and scalability.
Keywords :
Internet; data mining; parallel processing; PESMiner; Web log stream analysis; customer shopping sequence analysis; interactive temporal data mining algorithm; iterative MapReduce tasks; large-scale interval-based temporal data; parallel sequential pattern mining; quantitative sequential pattern mining; unified parallel solution; Algorithm design and analysis; Clustering algorithms; Data mining; Educational institutions; Pattern matching; Prototypes; Web services; interval-based temporal data; iterative MapReduce; quantitative sequential pattern mining;