DocumentCode
2790257
Title
Conditional Random Fields Feature Subset Selection Based on Genetic Algorithms for Phosphorylation Site Prediction
Author
Dang, Thanh Hai ; Engelen, Kristof ; Meysman, Pieter ; Marchal, Kathleen ; Verschoren, Alain ; Laukens, Kris
Author_Institution
Dept. of Math. & Comput. Sci., Intell. Syst. Lab., Antwerp, Belgium
fYear
2009
fDate
13-17 Oct. 2009
Firstpage
7
Lastpage
12
Abstract
Conditional random fields (CRFs) are undirected probabilistic graphical models that were introduced for solving sequence labeling and segmenting problems. CRFs have several advantages compared to other well understood and widely used techniques such as hidden Markov models (HMMs) or maximum entropy Markov models (MEMMs). Being a conditional model, it does not explicitly model the input data sequences but uses feature functions (features) to incorporate the arbitrary interactions and inter-dependencies that exist in the observation sequences. The number of all possible features is extremely large, up to millions, and is usually specified and designed in advance or according to a feature-generating scheme based on domain knowledge. This paper introduces a feature subset selection method for CRFs based on genetic algorithms, in which a population of candidate feature function subsets is evolved to achieve a maximal CRF performance. The method was experimentally validated on the well known bioinformatics problem of protein phosphorylation site prediction, phosphorylation being one of the most important protein modification mechanisms.
Keywords
biochemistry; bioinformatics; genetic algorithms; graph theory; molecular biophysics; probability; proteins; bioinformatics; conditional random fields; domain knowledge; feature subset selection; genetic algorithms; probabilistic graphical models; protein modification; protein phosphorylation site prediction; Bioinformatics; Entropy; Genetic algorithms; Graphical models; Hidden Markov models; Input variables; Labeling; Laboratories; Proteins; Sequences; Bioinformatics; Conditional Random Fields; Genetic Algorithm; Phosphorylation site prediction;
fLanguage
English
Publisher
ieee
Conference_Titel
Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
Conference_Location
Hanoi
Print_ISBN
978-1-4244-5086-2
Electronic_ISBN
978-0-7695-3846-4
Type
conf
DOI
10.1109/KSE.2009.11
Filename
5361737
Link To Document