DocumentCode :
1134692
Title :
A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats
Author :
Treangen, Todd J. ; Darling, Aaron E. ; Achaz, Guillaume ; Ragan, Mark A. ; Messeguer, Xavier ; Rocha, Eduardo P C
Author_Institution :
Inst. Pasteur, UPMC Univ., Paris
Volume :
6
Issue :
2
fYear :
2009
Firstpage :
180
Lastpage :
189
Abstract :
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire.
Keywords :
DNA; Markov processes; biology computing; cellular biophysics; genomics; iterative methods; molecular biophysics; MUSCLE implementation; filtration method; gapped extension; genome sequences; genomes; genomic DNA; heuristic; hidden Markov model; homologous nucleotides; interspersed DNA repeats; iterative refinement; local multiple alignment; nucleotide substitutions; pairwise local sequence alignment; posterior probability; time-reversible nucleotide substitution matrix; Bioinformatics; DNA; Filtration; Frequency; Genomics; Hidden Markov models; Matrices; Muscles; Scalability; Sequences; DNA repeats; Sequence alignment; gapped extension.; genome comparison; hidden Markov model; local multiple alignment; Base Sequence; Computer Simulation; DNA; DNA, Bacterial; Genome, Bacterial; Interspersed Repetitive Sequences; Markov Chains; Models, Statistical; Molecular Sequence Data; Mycoplasma genitalium; Sequence Alignment; Sequence Homology, Nucleic Acid; Software;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2009.9
Filename :
4770094
Link To Document :
بازگشت