DocumentCode
419553
Title
Feature extraction for improved profile HMM based biological sequence analysis
Author
Plötz, Thomas ; Fink, Gernot A.
Author_Institution
Fac. of Technol., Bielefeld Univ., Germany
Volume
2
fYear
2004
fDate
23-26 Aug. 2004
Firstpage
315
Abstract
State-of-the-art systems for biological sequence analysis employ statistical modeling techniques, most notably so-called profile HMMs. However, all approaches still rely on a purely symbolic sequence representation, which severely limits their capabilities in describing weak similarities between remotely homologue members of sequence families. Therefore, we propose a multi-channel signal-like sequence representation based on a combination of several numerically encoded biochemical properties of the individual residues. From this representation features are extracted capturing relevant local sequence properties by applying wavelet and principal component analysis. Evaluation results on a challenging task of sequence family classification prove that profile HMMs trained on the feature-based sequence representation significantly outperform discrete models.
Keywords
biology computing; feature extraction; hidden Markov models; principal component analysis; proteins; sequences; wavelet transforms; biological sequence analysis; feature extraction; hidden Markov model; multichannel signal sequence representation; principal component analysis; symbolic sequence representation; wavelet analysis; Amino acids; Biological information theory; Biological system modeling; Data mining; Discrete wavelet transforms; Feature extraction; Hidden Markov models; Principal component analysis; Proteins; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
ISSN
1051-4651
Print_ISBN
0-7695-2128-2
Type
conf
DOI
10.1109/ICPR.2004.1334187
Filename
1334187
Link To Document