• DocumentCode
    2790257
  • Title

    Conditional Random Fields Feature Subset Selection Based on Genetic Algorithms for Phosphorylation Site Prediction

  • Author

    Dang, Thanh Hai ; Engelen, Kristof ; Meysman, Pieter ; Marchal, Kathleen ; Verschoren, Alain ; Laukens, Kris

  • Author_Institution
    Dept. of Math. & Comput. Sci., Intell. Syst. Lab., Antwerp, Belgium
  • fYear
    2009
  • fDate
    13-17 Oct. 2009
  • Firstpage
    7
  • Lastpage
    12
  • Abstract
    Conditional random fields (CRFs) are undirected probabilistic graphical models that were introduced for solving sequence labeling and segmenting problems. CRFs have several advantages compared to other well understood and widely used techniques such as hidden Markov models (HMMs) or maximum entropy Markov models (MEMMs). Being a conditional model, it does not explicitly model the input data sequences but uses feature functions (features) to incorporate the arbitrary interactions and inter-dependencies that exist in the observation sequences. The number of all possible features is extremely large, up to millions, and is usually specified and designed in advance or according to a feature-generating scheme based on domain knowledge. This paper introduces a feature subset selection method for CRFs based on genetic algorithms, in which a population of candidate feature function subsets is evolved to achieve a maximal CRF performance. The method was experimentally validated on the well known bioinformatics problem of protein phosphorylation site prediction, phosphorylation being one of the most important protein modification mechanisms.
  • Keywords
    biochemistry; bioinformatics; genetic algorithms; graph theory; molecular biophysics; probability; proteins; bioinformatics; conditional random fields; domain knowledge; feature subset selection; genetic algorithms; probabilistic graphical models; protein modification; protein phosphorylation site prediction; Bioinformatics; Entropy; Genetic algorithms; Graphical models; Hidden Markov models; Input variables; Labeling; Laboratories; Proteins; Sequences; Bioinformatics; Conditional Random Fields; Genetic Algorithm; Phosphorylation site prediction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4244-5086-2
  • Electronic_ISBN
    978-0-7695-3846-4
  • Type

    conf

  • DOI
    10.1109/KSE.2009.11
  • Filename
    5361737