Title :
BMF: Bitmapped Mass Fingerprinting for Fast Protein Identification
Author :
Yu, Weikuan ; Wu, K. John ; Ku, Wei-Shinn ; Xu, Cong ; Gao, Juan
Abstract :
Protein identification is an important objective for proteomic and medical sciences as well as for pharmaceutical industry. With recent large-scale automation of genome sequencing and the explosion of protein databases, it is important to exploit latest data processing technologies and design highly scalable algorithms to speed up protein identification. In this study, we design, implement, and evaluate a new software tool, Bitmapped Mass Fingerprinting (BMF), that can efficiently construct a bitmap index for short peptides, and quickly identify candidate proteins from leading protein databases. BMF is developed by integrating the Fast Bit indexing technology and the popular Message Passing Interface (MPI) for parallelization. By exploiting Fast Bit for peptide mass fingerprinting across protein boundaries, we are able to accomplish parallel computation and I/O for a scalable implementation of protein identification. Our experimental results show that BMF brings dramatic performance improvement for protein identification from various protein databases. In particular, we demonstrate that BMF can effectively scale up to 8,192 cores on the Jaguar Supercomputer at Oak Ridge National Laboratory, achieving superb performance in identifying proteins from the National Center for Biotechnology Information (NCBI) non-redundant (NR) protein database.
Keywords :
biology computing; database indexing; genetics; message passing; proteins; software tools; Jaguar Supercomputer; MPI; National Center for Biotechnology Information; Oak Ridge National Laboratory; bitmapped mass fingerprinting; data processing technologies; fast bit indexing technology; fast protein identification; genome sequencing; highly scalable algorithms; large-scale automation; medical sciences; message passing interface; nonredundant protein database; parallel computation; peptide mass fingerprinting; pharmaceutical industry; protein databases; proteomic sciences; short peptides; software tool; Amino acids; Fingerprint recognition; Indexes; Microorganisms; Peptides; Proteins; Cray XT5; FastBit; Peptide Mass Fingerprinting; Protein Databases; Protein Identification;
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
DOI :
10.1109/CLUSTER.2011.11