Frequent substructures and fold classification from protein contact maps

Author

Suvarna Vani, K. ; Om Swaroopa, M. ; Sravani, T.D. ; Praveen Kumar, K.

Author_Institution

Dept. of Comput. Sci. & Eng., V.R. Siddhartha Eng. Coll., Vijayawada, India

fYear

2014

fDate

21-24 May 2014

Firstpage

Lastpage

Abstract

The three dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Approaches to protein structure/fold classification typically extract amino acid sequence features, and machine-learning approaches are applied to classification problem. Protein contact maps are two-dimensional representations of contacts among the amino acid residues in the folded protein structure. Many researchers make note of the way secondary structures are clearly visible in the contact maps where alpha-helices are seen as thick bands and the beta-sheets as orthogonal to the diagonal. Some patterns in off-diagonal contact maps correspond to configurations of protein secondary structures. This paper explores the idea of extracting rules from contact maps to represent fold information. Contact maps for proteins of any length are generated. An efficient way to extract Secondary Structure Elements from contact maps is adopted. This method achieves appreciable performance, when compared to the original Secondary Structure Elements. Frequent substructures are extracted using a graph based pattern learning system, SUBDUE, to six folds in All-Alpha structural class. Extracted substructures are mapped to three-dimensional structure that proves the performance of the work. To extract additional features from off-diagonal contact map, Triangle Sub Division Method is implemented and feature set is enhanced to 20 regions of interest. An accuracy of 70% is achieved by the J48 decision tree classifier. The decision tree classifier results, gain understanding of rules generated for each structural class. The differences in regions of interest are distinguished for All-Alpha structural class. This method needs to be validated on other SCOP classes.

Keywords

biochemistry; bioinformatics; cellular biophysics; data mining; decision trees; feature extraction; graphs; learning (artificial intelligence); molecular biophysics; molecular configurations; pattern classification; proteins; J48 decision tree classifier; SCOP classes; SUBDUE; all-alpha structural class; alpha helices; amino acid residues; amino acid sequence features; biochemical functions; biophysical functions; cell; feature set; folded protein structure; frequent substructures; graph based pattern learning system; machine-learning approaches; off-diagonal contact maps; orthogonal-diagonal beta-sheets; protein contact maps; protein structure-fold classification; regions-of-interest; secondary structure elements; secondary structures; three-dimensional protein structure; triangle subdivision method; two-dimensional representations; Accuracy; Amino acids; Data mining; Decision trees; Feature extraction; Proteins; Vectors; Association rule mining; Frequent patterns; Mining protein contact maps; Protein fold prediction;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on

Conference_Location

Honolulu, HI

Type

conf

DOI

10.1109/CIBCB.2014.6845518

Filename

6845518

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=167295