DocumentCode
3363199
Title
Mining emerging substrings
Author
Chan, Sarah ; Kao, Ben ; Yip, C.L. ; Tang, Michael
Author_Institution
Dept. of Comput. Sci. & Inf. Syst., City Univ. of Hong Kong, China
fYear
2003
fDate
26-28 March 2003
Firstpage
119
Lastpage
126
Abstract
We introduce a new type of KDD patterns called emerging substrings. In a sequence database, an emerging substring (ES) of a data class is a substring which occurs more frequently in that class rather than in other classes. ESs are important to sequence classification as they capture significant contrasts between data classes and provide insights for the construction of sequence classifiers. We propose a suffix tree-based framework for mining ESs, and study the effectiveness of applying one or more pruning techniques in different stages of our ES mining algorithm. Experimental results show that if the target class is of a small population with respect to the whole database, which is the normal scenario in single-class ES mining, most of the pruning techniques would achieve considerable performance gain.
Keywords
data mining; pattern recognition; string matching; KDD patterns; contrasts; pruning techniques; sequence classification; sequence classifiers; sequence database; single-class emerging substring mining; suffix tree-based framework; Classification tree analysis; Companies; Computer science; Data mining; Databases; Electronic switching systems; Humans; Information systems; Partitioning algorithms; Performance gain;
fLanguage
English
Publisher
ieee
Conference_Titel
Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on
Conference_Location
Kyoto, Japan
Print_ISBN
0-7695-1895-8
Type
conf
DOI
10.1109/DASFAA.2003.1192375
Filename
1192375
Link To Document