Title :
On Integrating Peptide Sequence Analysis and Relational Distance-Based Indexing
Author :
Xu, Weijia ; Mao, Rui ; Wang, Shu ; Miranker, Daniel P.
Author_Institution :
Dept. of Comput. Sci., Texas Univ., Austin, TX
Abstract :
Managing data with distance-based indexing methods has the potential to provide scalability and integration with relational database management systems and the SQL programming model. We previously demonstrated the advantages of such an approach for nucleotide sequences using Hamming distance (mismatch). However, the larger alphabet size of peptide sequences increases the dimensionality of the problem, making algorithmic results more challenging. The development of a metric-PAM substitution matrix enables metric-distance based indexing for peptide sequences. The performance of distance-based indexing for homologous protein retrieval entails trade-off among accuracy, scalability and computational cost. We investigate the application of the multi-vantage point (MVP) tree algorithm to index peptide k-mers based on global mPAM alignment. We show that k-mer retrieval can still maintain accuracy when k is at least as large as 6 that creates a domain of over 60 million key values and enables scalability sufficient for effective performance on large disk-resident sequence databases
Keywords :
SQL; biological techniques; biology computing; database indexing; molecular biophysics; proteins; relational databases; tree data structures; Hamming distance; SQL programming model; distance-based indexing methods; homologous protein retrieval; k-mer retrieval; metric-PAM substitution matrix; multivantage point tree algorithm; nucleotide sequences; peptide k-mers indexing; peptide sequence analysis; relational database management systems; Computational efficiency; Hamming distance; Indexing; Information retrieval; Matrices; Peptides; Proteins; Relational databases; Scalability; Sequences;
Conference_Titel :
BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on
Conference_Location :
Arlington, VA
Print_ISBN :
0-7695-2727-2
DOI :
10.1109/BIBE.2006.253312