DocumentCode
950269
Title
An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System
Author
Jiang, Karl ; Thorsen, Oystein ; Peters, Amanda ; Smith, Brian ; Sosa, Carlos P.
Author_Institution
IBM, Rochester
Volume
19
Issue
1
fYear
2008
Firstpage
15
Lastpage
23
Abstract
Bioinformatics databases used for sequence comparison and sequence alignment are growing exponentially. This has popularized programs that carry out database searches. Current implementations of sequence alignment methods based on hidden Markov models (HMM) have proven to be computationally intensive and, hence, amenable to architectures with multiple processors. In this paper, we describe a modified version of the original parallel implementation of HMMs on a massively parallel system. This is part of the HMMER bioinformatics code. HMMER 2.3.2 uses profile HMMs for sensitive database searching based on statistical descriptions of a sequence family´s consensus (Durbin et al., 1998), Two of the nine programs were further parallelized to take advantage of the large number of processors, namely, hmmsearch and hmmpfam. For our study, we start by porting the parallel virtual machine (PVM) versions of these two programs currently available as part of the HMMER suite of programs. We report the performance of these nonoptimized versions as baselines. Our work also includes the introduction of an alternate sequence file indexing, multiple-master configuration, dynamic data collection and, finally, load balancing via the indexed sequence files. This set of optimizations constitutes our modified version for massively parallel systems. Our results show parallel performance improvements of more than one order of magnitude (16 times) for hmmsearch and hmmpfam.
Keywords
biology computing; database indexing; genetics; hidden Markov models; resource allocation; virtual machines; HMMER 2.3.2; HMMER bioinformatics code; alternate sequence file indexing; bioinformatics databases; database searches; dynamic data collection; genomic sequence search; hidden Markov models; hmmpfam; hmmsearch; load balancing; massively parallel systems; multiple processors; multiple-master configuration; nonoptimized versions; parallel virtual machine; sensitive database searching; sequence comparison; HMMER; Hidden Markov models; bioinformatics.; genomic sequence-search; massively parallel systems; multiple master parallelization; parallel implementation;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2007.70712
Filename
4359412
Link To Document