• DocumentCode
    451246
  • Title

    A Million-Fold Speed Improvement in Genomic Repeats Detection

  • Author

    Romein, John W. ; Heringa, Jaap ; Bal, Henri E.

  • Author_Institution
    Vrije Universiteit, Amsterdam, The Netherlands
  • fYear
    2003
  • fDate
    15-21 Nov. 2003
  • Firstpage
    20
  • Lastpage
    20
  • Abstract
    This paper presents a novel, parallel algorithm for generating top alignments. Top alignments are used for finding internal repeats in biological sequences like proteins and genes. Our algorithm replaces an older, sequential algorithm (Repro), which was prohibitively slow for sequence lengths higher than 2000. The new algorithm is an order of magnitude faster (O(n3) rather than O(n4)). The paper presents a three-level parallel implementation of the algorithm: using SIMD multimedia extensions found on present-day processors (a novel technique that can be used to parallelize any application that performs many sequence alignments), using shared-memory parallelism, and using distributed-memory parallelism. It allows processing the longest known proteins (nearly 35000 amino acids). We show exceptionally high speed improvements: between 548 and 889 on a cluster of 64 dual-processor machines, compared to the new sequential algorithm. Especially for long sequences, extreme speed improvements over the old algorithm are obtained.
  • Keywords
    Amino acids; Bioinformatics; Clustering algorithms; Computer science; Evolution (biology); Genomics; Parallel processing; Permission; Proteins; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Supercomputing, 2003 ACM/IEEE Conference
  • Print_ISBN
    1-58113-695-1
  • Type

    conf

  • DOI
    10.1109/SC.2003.10018
  • Filename
    1592923