DocumentCode :
2282911
Title :
Interpretable knowledge acquisition for predicting DNA-binding domains using an evolutionary fuzzy classifier method
Author :
Huang, Hui-Ling ; Chang, Fang-Lin ; Ho, Shinn-Jang ; Shu, Li-Sun ; Ho, Shinn-Ying
Author_Institution :
Dept. of Biol. Sci. & Technol., Nat. Chiao Tung Univ., Hsinchu, Taiwan
Volume :
4
fYear :
2011
fDate :
10-12 June 2011
Firstpage :
295
Lastpage :
299
Abstract :
DNA-binding domains are functional proteins in a cell, which plays a vital role in various essential biological activities. It is desirable to predict and analyze novel proteins from protein sequences only using machine learning approaches. Numerous prediction methods were proposed by identifying informative features and designing effective classifiers. The support vector machine (SVM) is well recognized as an accurate and robust classifier. However, the block-box mechanism of SVM suffers from low interpretability for biologists. It is better to design a prediction method using interpretable features and prediction results. In this study, we propose an interpretable physicochemical property classifier (named iPPC) with an accurate and compact fuzzy rule base using a scatter partition of feature space for DNA-binding data analysis. In designing iPPC, the flexible membership function, fuzzy rule, and physicochemical properties selection are simultaneously optimized. An intelligent genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters to maximize prediction accuracy, minimize the number of features selected, and minimize the number of fuzzy rules. Using benchmark datasets of DNA-binding domains, iPPC obtains the training accuracy of 81% and test accuracy of 79% with three fuzzy rules and two physicochemical properties. Compared with the decision tree method with a training accuracy of 77%, iPPC has a more compact and interpretable knowledge base. The two physicochemical properties are Number of hydrogen bond donors and Helix-coil equilibrium constant in the AAindex database.
Keywords :
DNA; biology computing; data analysis; fuzzy set theory; genetic algorithms; knowledge acquisition; knowledge based systems; learning (artificial intelligence); pattern classification; proteins; support vector machines; AAindex database; DNA-binding data analysis; DNA-binding domain prediction; Helix-coil equilibrium constant; evolutionary fuzzy classifier method; flexible membership function; fuzzy rule base; hydrogen bond donors; iPPC; intelligent genetic algorithm; interpretable physicochemical property classifier; knowledge acquisition; knowledge base; machine learning approach; protein sequences; support vector machine; Accuracy; Amino acids; Bioinformatics; DNA; Proteins; Support vector machines; Training; DNA-binding; fuzzy classifier; genetic algorithm; knowledge acquistion; physicochemical properties; prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-8727-1
Type :
conf
DOI :
10.1109/CSAE.2011.5952854
Filename :
5952854
Link To Document :
بازگشت