Author/Authors :
Zhang, Lu Shanghai Maritime University - Shanghai, China , Liu, Min Shanghai Maritime University - Shanghai, China , Qin, Xinyi Shanghai Maritime University - Shanghai, China , Liu, Guangzhong Shanghai Maritime University - Shanghai, China
Abstract :
Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation
and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related
to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify
succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the
incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and
the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC),
position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly
employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the
incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase
prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters
of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is
evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.
Keywords :
IFS-LightGBM , Succinylation , CKSAAP , Protein