• DocumentCode
    419553
  • Title

    Feature extraction for improved profile HMM based biological sequence analysis

  • Author

    Plötz, Thomas ; Fink, Gernot A.

  • Author_Institution
    Fac. of Technol., Bielefeld Univ., Germany
  • Volume
    2
  • fYear
    2004
  • fDate
    23-26 Aug. 2004
  • Firstpage
    315
  • Abstract
    State-of-the-art systems for biological sequence analysis employ statistical modeling techniques, most notably so-called profile HMMs. However, all approaches still rely on a purely symbolic sequence representation, which severely limits their capabilities in describing weak similarities between remotely homologue members of sequence families. Therefore, we propose a multi-channel signal-like sequence representation based on a combination of several numerically encoded biochemical properties of the individual residues. From this representation features are extracted capturing relevant local sequence properties by applying wavelet and principal component analysis. Evaluation results on a challenging task of sequence family classification prove that profile HMMs trained on the feature-based sequence representation significantly outperform discrete models.
  • Keywords
    biology computing; feature extraction; hidden Markov models; principal component analysis; proteins; sequences; wavelet transforms; biological sequence analysis; feature extraction; hidden Markov model; multichannel signal sequence representation; principal component analysis; symbolic sequence representation; wavelet analysis; Amino acids; Biological information theory; Biological system modeling; Data mining; Discrete wavelet transforms; Feature extraction; Hidden Markov models; Principal component analysis; Proteins; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2128-2
  • Type

    conf

  • DOI
    10.1109/ICPR.2004.1334187
  • Filename
    1334187