• DocumentCode
    1991316
  • Title

    Parallel Large Scale Inference of Protein Domain Families

  • Author

    Kahn, Daniel ; Rezvoy, Clément ; Vivien, Frédéric

  • Author_Institution
    INRIA, Villeurbanne, France
  • fYear
    2008
  • fDate
    8-10 Dec. 2008
  • Firstpage
    72
  • Lastpage
    79
  • Abstract
    The resolution of combinatorial assortments of protein sequences into domains is a prerequisite for protein sequence interpretation. However the recognition and clustering of homologous domains from sequence databases typically scales quadratically with respect to their size which grows exponentially, making it essential to parallelize these complex bioinformatics applications. Here we demonstrate the parallelization of MKDOM2, the sequential program that has been instrumental in the construction of the PRODOM database of protein domain families. This was challenging because of (1) dependencies between program iterations, (2) their extremely heterogeneous run times and (3) communication bottlenecks that could arise because of the large size of the data. A large scale test of the new program, MPI_MKDOM2, demonstrated its robustness against heterogeneous run times, preparing the grounds for future releases of PRODOM that would otherwise be out of reach with MKDOM2 by several orders of magnitude.
  • Keywords
    bioinformatics; grid computing; message passing; parallel programming; proteins; MKDOM2; MPI_MKDOM2; PRODOM database; complex bioinformatics applications; grid computing; message passing; parallel large scale inference; protein domain families; protein sequence interpretation; sequence databases; Bioinformatics; Databases; Genomics; Instruments; Iterative algorithms; Large-scale systems; Protein engineering; Protein sequence; Robustness; Testing; Bioinformatics; Grid Computing; Message passing; Protein Domains; Sequence clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • ISSN
    1521-9097
  • Print_ISBN
    978-0-7695-3434-3
  • Type

    conf

  • DOI
    10.1109/ICPADS.2008.115
  • Filename
    4724305