• DocumentCode
    2341291
  • Title

    Fast and sensitive alignment of large genomic sequences

  • Author

    Brudno, Michael ; Morgenstern, Burkhard

  • Author_Institution
    Dept. of Comput. Sci., Stanford Univ., CA, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    138
  • Lastpage
    147
  • Abstract
    Comparative analysis of syntenic genome sequences can be used to identify functional sites such as exons and regulatory elements. Here, the first step is to align two or several evolutionary related sequences and, in recent years, a number of computer programs have been developed for alignment of large genomic sequences. Some of these programs are extremely fast but often time-efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast heuristic identifies a chain of strong sequence similarities that serve as anchor points. In a second step, regions between these anchor points are aligned using a slower but more sensitive method. We present CHAOS, a novel algorithm for rapid identification of chains of local sequence similarities among large genomic sequences. Similarities identified by CHAOS are used as anchor points to improve the running time of the DIALIGN alignment program. Systematic test runs show that this method can reduce the running time of DIALIGN by more than 93% while affecting the quality of the resulting alignments by only 1%. The source code for CHAOS is available at http://www.stanford.edu/∼brudno/chaos/.
  • Keywords
    biology computing; computational complexity; genetics; heuristic programming; CHAOS; DIALIGN; alignment program; anchored-alignment approach; evolutionary related sequences; exons; fast sensitive alignment; heuristic; large genomic sequences; local sequence similarity chain identification; regulatory elements; strong sequence similarities; syntenic genome sequences; time-efficiency; Availability; Bioinformatics; Computer Society; Genetic mutations; Genomics; Large-scale systems; Noise reduction; Organisms; Statistical analysis; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
  • Print_ISBN
    0-7695-1653-X
  • Type

    conf

  • DOI
    10.1109/CSB.2002.1039337
  • Filename
    1039337