DocumentCode
2454598
Title
Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method
Author
Davidson, Nicholas J. ; Wang, Xueyi
Author_Institution
Dept. of Math., Boise State Univ., Boise, ID, USA
fYear
2010
fDate
12-14 Dec. 2010
Firstpage
546
Lastpage
551
Abstract
As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.
Keywords
biology computing; enzymes; feature extraction; pattern classification; prediction theory; support vector machines; computational method; ensemble method; homologous protein functions prediction; k-nearest neighbor algorithms; nonalignment features based enzyme classification; protein structure; support vector machines; Accuracy; Kernel; Magnesium; Proteins; Support vector machine classification; ensemble methods; enzyme/non-enzyme classification; k-nearest neighbour algorithm; support vector machine;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
Conference_Location
Washington, DC
Print_ISBN
978-1-4244-9211-4
Type
conf
DOI
10.1109/ICMLA.2010.167
Filename
5708884
Link To Document