Title :
Classification of categorical sequences
Author :
Kelil, Abdellali ; Nordell-Markovits, Alexei ; Wang, Shengrui
Author_Institution :
ProspectUS Lab., Univ. of Sherbrooke, Sherbrooke, QC, Canada
Abstract :
The classification of categorical sequences is a fundamental process in many application fields. A key issue is to extract and make use of significant features hidden behind the chronological and structural dependencies found in these sequences. Almost all existing algorithms designed to perform this task are based on the matching of patterns in chronological order, but sequences often have similar structural features in non-chronological order. In addition, these algorithms have serious difficulties to outperform domain-specific algorithms. In this paper we propose CLASS, a general approach for the classification of categorical sequences. CLASS captures the significant patterns and reduces the influence of those representing merely noise. Moreover, CLASS employs a classifier called SNN for significant-nearest-neighbours, inspired from the K-nearest-neighbours with a dynamic estimation of K. The extensive tests performed on a range of datasets from different fields show that CLASS is oftentimes competitive with domain-specific approaches.
Keywords :
feature extraction; pattern classification; pattern matching; CLASS approach; categorical sequence classification; dynamic estimation; feature extraction; pattern matching; significant-nearest-neighbours; Algorithm design and analysis; Costs; Data mining; Laboratories; Matrix decomposition; Noise reduction; Pattern matching; Performance evaluation; Proteins; Testing;
Conference_Titel :
Communications, Computers and Signal Processing, 2009. PacRim 2009. IEEE Pacific Rim Conference on
Conference_Location :
Victoria, BC
Print_ISBN :
978-1-4244-4560-8
Electronic_ISBN :
978-1-4244-4561-5
DOI :
10.1109/PACRIM.2009.5291297