DocumentCode :
2075251
Title :
Generalized suffix trees for biological sequence data: applications and implementation
Author :
Bieganski, Paul ; Riedl, John ; Cartis ; Retzel, Ernest F.
Author_Institution :
Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN, USA
Volume :
5
fYear :
1994
fDate :
4-7 Jan. 1994
Firstpage :
35
Lastpage :
44
Abstract :
This paper addresses applications of suffix trees and generalized suffix trees (GSTs) to biological sequence data analysis. We define a basic set of suffix trees and GST operations needed to support sequence data analysis. While those definitions are straightforward, the construction and manipulation of disk-based GST structures for large volumes of sequence data requires intricate design. GST processing is fast because the structure is content addressable, supporting efficient searches for all sequences that contain particular subsequences. Instead of laboriously searching sequences stored as arrays, we search by walking down the tree. We present a new GST-based sequence alignment algorithm, called GESTALT. GESTALT finds all exact matches in parallel, and uses best-first search to extend them to produce alignments. Our implementation experiences with applications using GST structures for sequence analysis lead us to conclude that GSTs are valuable tools for analyzing biological sequence data.<>
Keywords :
biology computing; cellular biophysics; data analysis; query processing; tree data structures; GESTALT; GST operations; best-first search; biological sequence data; biological sequence data analysis; content addressable; disk-based GST structures; efficient search; generalized suffix trees; genetic coding; sequence alignment algorithm; sequence analysis; sequence data; sequence data analysis; suffix trees;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on
Conference_Location :
Wailea, HI, USA
Print_ISBN :
0-8186-5090-7
Type :
conf
DOI :
10.1109/HICSS.1994.323593
Filename :
323593
Link To Document :
بازگشت