Title :
Classification of unaligned sequences based on prototype motifs representation
Author :
Arango-Argoty, G.A. ; Jaramillo-Garzón, J.A. ; Röthlisberger, S. ; Castellanos-Domínguez, C.G.
Author_Institution :
Signal Process. & Recognition Group, Univ. Nac. de Colombia sede Manizales, Manizales, Colombia
Abstract :
Estimating the function of unknown proteins is one of the main goals in bioinformatics. In the last few years, many pattern recognition algorithms have been developed, usually focusing on global information of the protein. Conversely, predictions can be done through the identification of functional sub-sequence patterns or motifs, but most methods for motifs detection require a predefined fixed window size and aligned sequences. To confront these problems we proposed a method based on variable length motifs detection using the continuous wavelet transform and a dissimilarity space representation for evaluating the performance of our approach. A Support Vector Machine classifier is used as predictor. The highest Sn = 96.67% and Sp = 96.79%, across 10-fold cross validation, is obtained for proteins kinases. Average results Sn = 81.17% and Sp = 87.43% are comparable to current state of the art.
Keywords :
bioinformatics; pattern classification; proteins; support vector machines; wavelet transforms; bioinformatics; continuous wavelet transform; dissimilarity space representation; motif representation; pattern recognition algorithm; protein function estimation; protein kinases; sequence classification; support vector machine classifier; Amino acids; Bioinformatics; Proteins; Prototypes; Support vector machines; Wavelet transforms; Molecular function; Motifs; hydrophathy scale; support vector machine; wavelet transform;
Conference_Titel :
Computing Congress (CCC), 2011 6th Colombian
Conference_Location :
Manizales
Print_ISBN :
978-1-4577-0285-3
DOI :
10.1109/COLOMCC.2011.5936297