• DocumentCode
    3363199
  • Title

    Mining emerging substrings

  • Author

    Chan, Sarah ; Kao, Ben ; Yip, C.L. ; Tang, Michael

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Syst., City Univ. of Hong Kong, China
  • fYear
    2003
  • fDate
    26-28 March 2003
  • Firstpage
    119
  • Lastpage
    126
  • Abstract
    We introduce a new type of KDD patterns called emerging substrings. In a sequence database, an emerging substring (ES) of a data class is a substring which occurs more frequently in that class rather than in other classes. ESs are important to sequence classification as they capture significant contrasts between data classes and provide insights for the construction of sequence classifiers. We propose a suffix tree-based framework for mining ESs, and study the effectiveness of applying one or more pruning techniques in different stages of our ES mining algorithm. Experimental results show that if the target class is of a small population with respect to the whole database, which is the normal scenario in single-class ES mining, most of the pruning techniques would achieve considerable performance gain.
  • Keywords
    data mining; pattern recognition; string matching; KDD patterns; contrasts; pruning techniques; sequence classification; sequence classifiers; sequence database; single-class emerging substring mining; suffix tree-based framework; Classification tree analysis; Companies; Computer science; Data mining; Databases; Electronic switching systems; Humans; Information systems; Partitioning algorithms; Performance gain;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on
  • Conference_Location
    Kyoto, Japan
  • Print_ISBN
    0-7695-1895-8
  • Type

    conf

  • DOI
    10.1109/DASFAA.2003.1192375
  • Filename
    1192375