Title :
Confirming biological significance of co-occurrence clusters of aligned pattern clusters
Author :
Lee, En-Shiun Annie ; Fung, S. ; Ho-Yin Sze-To ; Wong, Andrew K. C.
Author_Institution :
Syst. Design Eng., Univ. of Waterloo, Waterloo, ON, Canada
Abstract :
Advances in bioinformatics have provided researchers with a large influx of novel sequences, thus making the analysis of the sequences for inherent biological knowledge crucial. By using pattern discovery and pattern synthesis on protein family sequences, conserved protein segments can be represented by Aligned Pattern Clusters (APC), which is more knowledge-rich in statistical association comparing to probabilistic models. Such representation enabled us to exploit their co-occurrence on the same protein sequence to identify functional regions. In this paper, we developed an efficient algorithm to identify the frequently co-occurring patterns using only homologous protein sequences as input. We applied our algorithm to triosephosphate isomerase and ubiquitin for a detailed study. We found that the discovered co-occurring patterns are close in spatial distance in most cases, by comparing to corresponding 3D structures. We also found that the co-occurrence of patterns are biologically significant. Residues which play important and co-operative roles in the glycolytic pathway of triosephosphate isomerase and residues which are responsible for ubiquitination and ubiquitin-binding of ubiquitin are all covered in our co-occurring APCs. These results demonstrate the power of our algorithm to reveal the concurrent distant functional and structural relation of proteins sequences based on co-occurrence clusters of APCs.
Keywords :
biochemistry; bioinformatics; bonds (chemical); data mining; enzymes; macromolecules; molecular biophysics; molecular configurations; molecular orientation; pattern clustering; sequences; statistical analysis; 3D structure comparison; aligned pattern clusters; bioinformatics; cooccurrence cluster biological significance confirmation; cooccurring APC; distant protein sequence functional relation; frequently cooccurring pattern identification; glycolytic pathway; homologous protein sequences; inherent biological knowledge; knowledge-rich APC; probabilistic models; protein family sequence pattern discovery; protein family sequence pattern synthesis; protein functional region identification; protein segment representation; protein sequence structural relation; residues; sequence analysis; spatial distance; statistical analysis; triosephosphate isomerase; ubiquitin binding; ubiquitination; Amino acids; Clustering algorithms; Educational institutions; Indexes; Protein sequence; Clustering; Co-occurrence; K-means clustering; Pattern; Sequence; Triosephosphate isomerase; Ubiquitin;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732529