DocumentCode :
2732191
Title :
CPS-tree: A Compact Partitioned Suffix Tree for Disk-based Indexing on Large Genome Sequences
Author :
Swee-Seong Wong ; Wing-Kin Sung ; Limsoon Wong
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
fYear :
2007
fDate :
15-20 April 2007
Firstpage :
1350
Lastpage :
1354
Abstract :
Suffix tree is an important data structure for indexing a long sequence (like a genome sequence) or a concatenation of sequences. It finds many applications in practice, especially in the domain of bioinformatics. Suffix tree allows for efficient pattern search with time independent of the sequence length. However, the performance of disk-based suffix tree is a concern as it is slowed down significantly by poor localized access resulting in high 10 disk access. The focus of this paper is to design an IO-efficient and compact partitioned suffix tree representation (CPS-tree) on disk. We show that representing suffix tree using CPS-tree has several advantages. First, our representation allows us to visit any node in the suffix tree by accessing at most log n pages of the tree where n is the length of the sequence. Second, our storage scheme improves the access pattern and reduces the number of page fault resulting in efficient search retrieval and efficient tree traversal operations. Third, by bit packing, our index is compact. Experimental results show that CPS-tree outperforms other indexes on disk. When fully loaded into the main memory, CPS-tree is still efficient. Hence, we expect CPS-tree to be a good disk-based representation of suffix tree, with potential use in practical applications.
Keywords :
biology computing; genetics; indexing; query formulation; sequences; storage management; tree data structures; bioinformatics; bit packing; compact partitioned suffix tree; data structure; disk-based indexing; disk-based representation; large genome sequences; pattern search; search retrieval; storage scheme; tree traversal operations; Bioinformatics; Databases; Genomics; Humans; Indexing; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Conference_Location :
Istanbul
Print_ISBN :
1-4244-0802-4
Type :
conf
DOI :
10.1109/ICDE.2007.369009
Filename :
4221799
Link To Document :
بازگشت