Title of article :
HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
Author/Authors :
Sang, Xiuzhi Shanghai Ocean University - Shanghai, China , Xiao, Wanyue School of Information - Syracuse University - Syracuse, USA , Zheng, Huiwen School of Engineering - University of Melbourne - Victoria, Australia , Yang, Yang School of Information Management - Nanjing University - Nanjing, China , Liu, Taigang Shanghai Ocean University - Shanghai, China
Abstract :
Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all
aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational
methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM)
profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we
propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance
transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature
selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal
features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two
benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an
alternative tool to identify DBPs.
Keywords :
HMMPred , DNA , XGBoost , DBP
Journal title :
Computational and Mathematical Methods in Medicine