DocumentCode :
1844358
Title :
Sequential Pattern Mining on Highly Similar and Dense Dataset
Author :
Jian Ding ; Meng Han
Author_Institution :
Sch. of Comput. Sci. & Eng., Beifang Univ. of Nat., Yinchuan, China
fYear :
2013
fDate :
21-23 June 2013
Firstpage :
762
Lastpage :
765
Abstract :
In recent years, there are a great deal of efforts on sequential pattern mining, but some challenges have not been resolved, such as large search spaces and the ineffectiveness in handling highly similar, dense and long sequences. In this paper, we mainly focus on how to design some effective search space pruning methods to accelerate the mining process. We present a novel structure, Prefix-Frequent-Items Graph (PFI-Graph), which presents the prefix frequent items of other items in sequential patterns. An efficient algorithm, Prefix-Frequent-Items PrefixSpan (PFI-PrefixSpan) based on PFI-Graph is proposed in this paper. It avoids redundant data scanning, and thus can effectively speed up the discovery process of new patterns. Extensive experimental results on some real sequence datasets show that the proposed novel structure is substantially more efficient than PrefixSpan with pseudo-projection, especially for dense and highly similar sequence databases.
Keywords :
data analysis; data mining; graph theory; search problems; PFI-PrefixSpan; PFI-graph; dense dataset; prefix-frequent-items graph; prefix-frequent-items prefixspan; pseudoprojection; search space pruning methods; sequential pattern mining; Algorithm design and analysis; Computer science; Data mining; Databases; Educational institutions; Electronic mail; Runtime; PrefixSpan; dense database; equential pattern mining; highly similar sequence; long sequence;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on
Conference_Location :
Shiyang
Type :
conf
DOI :
10.1109/ICCIS.2013.205
Filename :
6643121
Link To Document :
بازگشت