DocumentCode :
3453280
Title :
On the repetitive collection indexing problem
Author :
Alatabbi, A. ; Barton, Christopher ; Iliopoulos, Costas S.
Author_Institution :
Dept. of Inf., King´´s Coll. London, London, UK
fYear :
2012
fDate :
4-7 Oct. 2012
Firstpage :
682
Lastpage :
687
Abstract :
In large data sets such as genomes from a single species, large sets of reads, and version control data it is often noted that each entry only differs from another by a very small number of variations. This leads to a large set of data with a great deal of redundancy and repetitiveness. Rapid development in DNA sequencing technologies has caused a drastic growth in the size of publicly available sequence databases with such data. DNA sequencing has become so fast and cost-effective that sequencing individual genomes will soon become a common task [9] making querying and storing such sets of data an important task. In this paper, we propose an indexing structure for highly repetitive collections of sequence data based on a multilevel g-gram model. In particular, the proposed algorithm accommodates variations that may occur in the target sequence with respect to the reference sequence. The paper is organized as follows. Section [1] and [2] introduce the basic concepts and go through the related literature. In Section [3] we present notions and facts. Details of the proposed data structure/algorithm will be given in Section [5] and [4], Section [6] discusses complexity analysis and Section [7] gives conclusions of future work.
Keywords :
DNA; biology computing; genomics; indexing; query processing; storage management; DNA sequencing technologies; data querying; data storage; individual genome sequencing; multilevel g-gram model; reference sequence; repetitive collection indexing problem; sequence database; target sequence; Arrays; DNA; Entropy; Genomics; Humans; Indexes; Data Compression; Genome Databases; Index Structure; Relative Compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4673-2746-6
Electronic_ISBN :
978-1-4673-2744-2
Type :
conf
DOI :
10.1109/BIBMW.2012.6470220
Filename :
6470220
Link To Document :
بازگشت