Title :
Mining Protein Sequence Databases for Remote Homologues That Can Display Considerable Domain Length Variations
Author :
Mutt, Eshita ; Mitra, Abhijit ; Sowdhamini, R.
Author_Institution :
Nat. Centre for Biol. Sci. (TIFR), Bangalore, India
Abstract :
Protein domains are minimal structural units that can independently fold and carry out discrete biological functions. Evolutionary divergence amongst proteins not only cause considerable sequence changes of protein domains of similar folds and functions, but can also give rise to remarkable length variations. Rapid and heuristic sequence search algorithms are generally sensitive and effective in recognizing protein domains that are distantly related within large sequence databases, but are not well-suited to identify remote homologues of varying lengths. It is also challenging to distinguish reliable hits from a vast number of putative false positives that could have sub optimal sequence similarities. Here, we present a data-mining approach that provides stage-specific filters in sequence searches to reliably accumulate remote homologues which encourages sampling of length variations albeit no compensation on the validity of hitherto identified distant relationships. Realization of remote homologues with vivid length variations could contribute to better understanding of functional variety within protein domain super families.
Keywords :
bioinformatics; data mining; proteins; search problems; data mining approach; discrete biological functions; domain length variation; evolutionary divergence; heuristic sequence search algorithms; protein sequence database Mining; remote homologues; suboptimal sequence similarities; Data mining; Databases; Hidden Markov models; Pipelines; Proteins; Protocols; biological function; protein indels; protein superfamily; remote homology; sequence analysis; sequence data mining;
Conference_Titel :
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4673-0005-6
DOI :
10.1109/ICDMW.2011.122