Title :
Extracting decision rules in prediction of protein secondary structure
Author :
Nguyen, Minh N. ; Zurada, Jacek M. ; Rajapakse, Jagath C.
Author_Institution :
Bioinf. Institue & the Bioinf. Res. Centre, Nanyang Technol. Univ., Singapore
Abstract :
Information on secondary structures of amino acid residues in proteins provides valuable clues for the prediction of their 3-D structure and function. Although numerous computational techniques have been applied to predict protein secondary structure (PSS), only limited studies have dealt with discovery of logic rules underlying the prediction itself. Such rules offer interesting links between the prediction model and the underlying biology. In addition, they enhance interpretability of PSS prediction by providing a degree of transparency to the predicting model usually regarded as a black-box. In this paper, we explore the generation and use of C 4.5 decision trees to extract relevant rules from PSS predictions modeled with two-stage support vector machines (TS-SVM). Our approach has produced sizable sets of comprehensible, and often interpretable, rules underlying the PSS predictions. Moreover, many of the rules seem to be strongly supported by biological evidence. Further, our approach resulted in good prediction accuracy, few and usually compact rules, and rules that are generally of higher confidence levels than those generated by other rule extraction techniques. The proposed rules were derived and tested on the RS126 dataset of 126 nonhomologous globular proteins.
Keywords :
biology computing; decision trees; knowledge acquisition; molecular biophysics; molecular configurations; proteins; support vector machines; 3D protein structure; C4.5 decision trees; amino acid residues; decision rules extraction; logic rules; nonhomologous globular proteins; predicting model; protein function; protein secondary structure prediction; rule extraction; two-stage support vector machines; Amino acids; Biological system modeling; Biology computing; Computational biology; Data mining; Decision trees; Logic; Predictive models; Proteins; Support vector machines;
Conference_Titel :
BioInformatics and BioEngineering, 2008. BIBE 2008. 8th IEEE International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4244-2844-1
Electronic_ISBN :
978-1-4244-2845-8
DOI :
10.1109/BIBE.2008.4696700