• DocumentCode
    2050259
  • Title

    BMF: Bitmapped Mass Fingerprinting for Fast Protein Identification

  • Author

    Yu, Weikuan ; Wu, K. John ; Ku, Wei-Shinn ; Xu, Cong ; Gao, Juan

  • fYear
    2011
  • fDate
    26-30 Sept. 2011
  • Firstpage
    17
  • Lastpage
    25
  • Abstract
    Protein identification is an important objective for proteomic and medical sciences as well as for pharmaceutical industry. With recent large-scale automation of genome sequencing and the explosion of protein databases, it is important to exploit latest data processing technologies and design highly scalable algorithms to speed up protein identification. In this study, we design, implement, and evaluate a new software tool, Bitmapped Mass Fingerprinting (BMF), that can efficiently construct a bitmap index for short peptides, and quickly identify candidate proteins from leading protein databases. BMF is developed by integrating the Fast Bit indexing technology and the popular Message Passing Interface (MPI) for parallelization. By exploiting Fast Bit for peptide mass fingerprinting across protein boundaries, we are able to accomplish parallel computation and I/O for a scalable implementation of protein identification. Our experimental results show that BMF brings dramatic performance improvement for protein identification from various protein databases. In particular, we demonstrate that BMF can effectively scale up to 8,192 cores on the Jaguar Supercomputer at Oak Ridge National Laboratory, achieving superb performance in identifying proteins from the National Center for Biotechnology Information (NCBI) non-redundant (NR) protein database.
  • Keywords
    biology computing; database indexing; genetics; message passing; proteins; software tools; Jaguar Supercomputer; MPI; National Center for Biotechnology Information; Oak Ridge National Laboratory; bitmapped mass fingerprinting; data processing technologies; fast bit indexing technology; fast protein identification; genome sequencing; highly scalable algorithms; large-scale automation; medical sciences; message passing interface; nonredundant protein database; parallel computation; peptide mass fingerprinting; pharmaceutical industry; protein databases; proteomic sciences; short peptides; software tool; Amino acids; Fingerprint recognition; Indexes; Microorganisms; Peptides; Proteins; Cray XT5; FastBit; Peptide Mass Fingerprinting; Protein Databases; Protein Identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2011 IEEE International Conference on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-1355-2
  • Electronic_ISBN
    978-0-7695-4516-5
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2011.11
  • Filename
    6061061