DocumentCode :
766023
Title :
Multiseed lossless filtration
Author :
Kucherov, Gregory ; Noé, Laurent ; Roytberg, Mikhail
Author_Institution :
INRIA/LORIA, Villers-Les-Nancy, France
Volume :
2
Issue :
1
fYear :
2005
Firstpage :
51
Lastpage :
61
Abstract :
We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and Karkkainen. We present algorithms to compute several important parameters of seed families, study their combinatorial properties, and describe several techniques to construct efficient families. We also report a large-scale application of the proposed technique to the problem of oligonucleotide selection for an EST sequence database.
Keywords :
biology computing; dynamic programming; molecular biophysics; pattern recognition; EST sequence database; approximate string matching; bioinformatics; combinatorial properties; multiseed lossless filtration; oligonucleotide selection; seed families; seed-based lossless filtration; Bioinformatics; Databases; Dynamic programming; Filtering; Filtration; Heuristic algorithms; Large-scale systems; Loss measurement; Matched filters; Sequences; EST; Index Terms- Filtration; dynamic programming; gapped q-gram; gapped seed; local alignment; multiple spaced seeds; oligonucleotide selection.; seed family; sequence similarity; string matching; Algorithms; Base Sequence; Expressed Sequence Tags; Molecular Sequence Data; Sequence Alignment; Sequence Analysis, DNA; Sequence Homology, Nucleic Acid;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2005.12
Filename :
1416851
Link To Document :
بازگشت