DocumentCode
856368
Title
Rule generation for protein secondary structure prediction with support vector machines and decision tree
Author
He, Jieyue ; Hu, Hae-Jin ; Harrison, Robert ; Tai, Phang C. ; Pan, Yi
Author_Institution
Dept. of Comput. Sci. & Eng., Nanjing Univ.
Volume
5
Issue
1
fYear
2006
fDate
3/1/2006 12:00:00 AM
Firstpage
46
Lastpage
53
Abstract
Support vector machines (SVMs) have shown strong generalization ability in a number of application areas, including protein structure prediction. However, the poor comprehensibility hinders the success of the SVM for protein structure prediction. The explanation of how a decision made is important for accepting the machine learning technology, especially for applications such as bioinformatics. The reasonable interpretation is not only useful to guide the "wet experiments," but also the extracted rules are helpful to integrate computational intelligence with symbolic AI systems for advanced deduction. On the other hand, a decision tree has good comprehensibility. In this paper, a novel approach to rule generation for protein secondary structure prediction by integrating merits of both the SVM and decision tree is presented. This approach combines the SVM with decision tree into a new algorithm called SVM_DT, which proceeds in three steps. This algorithm first trains an SVM. Then, a new training set is generated through careful selection from the output of the SVM. Finally, the obtained training set is used to train a decision tree learning system and to extract the corresponding rule sets. The results of the experiments of protein secondary structure prediction on RS126 data set show that the comprehensibility of SVM_DT is much better than that of the SVM. Moreover, the generalization ability of SVM_DT is better than that of C4.5 decision trees and is similar to that of the SVM. Hence, SVM_DT can be used not only for prediction, but also for guiding biological experiments
Keywords
biology computing; decision trees; learning (artificial intelligence); molecular biophysics; molecular configurations; proteins; support vector machines; AI; SVM_DT; bioinformatics; comprehensibility; computational intelligence; decision tree; machine learning; protein secondary structure prediction; rule generation; support vector machines; Bioinformatics; Computer science; Decision trees; Helium; Machine learning; Pattern recognition; Proteins; Scholarships; Support vector machine classification; Support vector machines; Decision tree; protein structure; rule extraction; support vector machine (SVM);
fLanguage
English
Journal_Title
NanoBioscience, IEEE Transactions on
Publisher
ieee
ISSN
1536-1241
Type
jour
DOI
10.1109/TNB.2005.864021
Filename
1603533
Link To Document