DocumentCode :
415738
Title :
Efficient filtration of sequence similarity search through singular value decomposition
Author :
Aghili, S. Alireza ; Sahin, Ozgur D. ; Agrawal, Divyakant ; El Abbadi, Amr
Author_Institution :
Dept. of Comput. Sci., California Univ., Santa Barbara, CA, USA
fYear :
2004
fDate :
19-21 May 2004
Firstpage :
403
Lastpage :
410
Abstract :
Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole- genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the singular value decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a prokaryote and a eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2.3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs.
Keywords :
DNA; biology computing; database indexing; molecular biophysics; query processing; sequences; DNA contig dataset; bioinformatics; filtration; indexing; sequence similarity search; singular value decomposition; textual databases; transformation-based dimensionality reduction technique; Bioinformatics; Computer science; DNA; Databases; Filtration; Genomics; Indexing; Multidimensional systems; Sequences; Singular value decomposition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on
Print_ISBN :
0-7695-2173-8
Type :
conf
DOI :
10.1109/BIBE.2004.1317371
Filename :
1317371
Link To Document :
بازگشت