DocumentCode :
167295
Title :
Frequent substructures and fold classification from protein contact maps
Author :
Suvarna Vani, K. ; Om Swaroopa, M. ; Sravani, T.D. ; Praveen Kumar, K.
Author_Institution :
Dept. of Comput. Sci. & Eng., V.R. Siddhartha Eng. Coll., Vijayawada, India
fYear :
2014
fDate :
21-24 May 2014
Firstpage :
1
Lastpage :
8
Abstract :
The three dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Approaches to protein structure/fold classification typically extract amino acid sequence features, and machine-learning approaches are applied to classification problem. Protein contact maps are two-dimensional representations of contacts among the amino acid residues in the folded protein structure. Many researchers make note of the way secondary structures are clearly visible in the contact maps where alpha-helices are seen as thick bands and the beta-sheets as orthogonal to the diagonal. Some patterns in off-diagonal contact maps correspond to configurations of protein secondary structures. This paper explores the idea of extracting rules from contact maps to represent fold information. Contact maps for proteins of any length are generated. An efficient way to extract Secondary Structure Elements from contact maps is adopted. This method achieves appreciable performance, when compared to the original Secondary Structure Elements. Frequent substructures are extracted using a graph based pattern learning system, SUBDUE, to six folds in All-Alpha structural class. Extracted substructures are mapped to three-dimensional structure that proves the performance of the work. To extract additional features from off-diagonal contact map, Triangle Sub Division Method is implemented and feature set is enhanced to 20 regions of interest. An accuracy of 70% is achieved by the J48 decision tree classifier. The decision tree classifier results, gain understanding of rules generated for each structural class. The differences in regions of interest are distinguished for All-Alpha structural class. This method needs to be validated on other SCOP classes.
Keywords :
biochemistry; bioinformatics; cellular biophysics; data mining; decision trees; feature extraction; graphs; learning (artificial intelligence); molecular biophysics; molecular configurations; pattern classification; proteins; J48 decision tree classifier; SCOP classes; SUBDUE; all-alpha structural class; alpha helices; amino acid residues; amino acid sequence features; biochemical functions; biophysical functions; cell; feature set; folded protein structure; frequent substructures; graph based pattern learning system; machine-learning approaches; off-diagonal contact maps; orthogonal-diagonal beta-sheets; protein contact maps; protein structure-fold classification; regions-of-interest; secondary structure elements; secondary structures; three-dimensional protein structure; triangle subdivision method; two-dimensional representations; Accuracy; Amino acids; Data mining; Decision trees; Feature extraction; Proteins; Vectors; Association rule mining; Frequent patterns; Mining protein contact maps; Protein fold prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
Conference_Location :
Honolulu, HI
Type :
conf
DOI :
10.1109/CIBCB.2014.6845518
Filename :
6845518
Link To Document :
بازگشت