DocumentCode :
2897191
Title :
Nested similarity searching for elucidation of evolutionary distant sequences
Author :
Jean, Angela ; Lin, Feng ; Tong, Joo Chuan
Author_Institution :
Dept. of Biochem., Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2010
fDate :
6-8 Oct. 2010
Firstpage :
266
Lastpage :
271
Abstract :
Large sets of related gene and protein data are often hauled and examined to deduce their relationships and to provide insight into their evolution. Typically, sequences from primitive organisms would have undergone various mutations to give rise to orthologous sequences in more modern organisms. Heuristic tools are suitable for quick retrieval of similar sequences from databases. However, they are often unable to sieve out sequences that are distant and evolutionarily related. Other tools are pattern-centric and focuses on recurring conserved domains, but they also lack the capability of retrieving sequences of primitive organisms that have evolved through addition - or deletion, of functional domains that are present only in more modern organisms or otherwise. To solve this problem, we devised a new algorithm that performs a nested search on BLAST results. Through this algorithm, we are able to elucidate sequences that would have otherwise eluded a single-pass searching process. In addition, because of the inherent characteristic that each sequence can be related to the query sequence, a path of evolving sequences can be traced. Furthermore, by identifying tasks that can be executed concurrently, the proposed algorithm is parallelized and can be executed in a distributed environment. This prevents the prohibitive running time for large-scale dataset search while ensuring the integrity of the results. Our experiments showed the effectiveness and efficiency of the algorithm running on a multi-processor, distributed environment. While this is a resource intensive process, this is mitigated by pipelining and parallelizing parts of the algorithm that maximize the use of computing resources and minimizing idling time. In addition, iterative searching using full-length sequences ensure that the resultant set of related sequences can be used effectively for evolutionary comparative studies.
Keywords :
biology computing; genomics; query formulation; sequences; very large databases; BLAST results; computing resources; databases; distributed environment; elucidation; evolutionary distant sequences; gene data; heuristic tools; iterative searching; large-scale dataset; nested similarity searching; orthologous sequences; parallelizing parts; pipelining parts; protein data; query sequence; resource intensive process; sequences retrieval; single-pass searching process; Algorithm design and analysis; Bioinformatics; Databases; Heuristic algorithms; Organisms; Proteins; Runtime; Bioinformatics; comparative genomics; evolution; parallelized and pipelined computing; similarity searching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing Systems (SIPS), 2010 IEEE Workshop on
Conference_Location :
San Francisco, CA
ISSN :
1520-6130
Print_ISBN :
978-1-4244-8932-9
Electronic_ISBN :
1520-6130
Type :
conf
DOI :
10.1109/SIPS.2010.5624799
Filename :
5624799
Link To Document :
بازگشت