Title :
Efficient Updating of Biological Sequence Analyses
Author :
Hong, Changjin ; Tewfik, Ahmed H.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Minnesota, Minneapolis, MN
fDate :
6/1/2008 12:00:00 AM
Abstract :
We present a novel approach for reducing the computational complexity of updating homologies produced by a wide class of popular state-of-the-art algorithms in comparative computational biology. The algorithms that we consider use hidden Markov models (HMMs) and a Viterbi recursion to evaluate matches between sequences, or between a sequence and models. Such updates occur frequently in practice as researchers discover errors in biological sequences or analyze multiple nearly similar sequences, e.g., in a family of proteins that underwent mutations during evolution. The proposed algorithm interprets the Viterbi recursion as an update of an optimal minimum spanning tree in a shortest path problem. We propose the novel concept of a relative node tolerance bound and show how it can be used to guarantee that one or more partial subtrees of a minimum spanning tree obtained before encountering the perturbations remain optimal. We also describe how to compute and use in real-time the relative node tolerance bounds to skip most unperturbed parts of a sequence while computing the new optimal solution. To further reduce the computational overhead associated with the tolerance bound evaluation, we present and exploit a statistical analysis of the matching procedure that estimates how many columns in the dynamic program that corresponds to the matching problem are affected by a change in a preceding column. The resulting "reusable" Viterbi decoding algorithm can update a matching result in less than a third to a fifth of the time required to compute a new match by performing a normal matching procedure, i.e., running a Viterbi algorithm with updated sequences against a base hidden Markov model.
Keywords :
Viterbi decoding; genetics; hidden Markov models; medical signal processing; statistical analysis; Viterbi decoding algorithm; Viterbi recursion; biological sequence analyses; computational biology; computational complexity; hidden Markov models; optimal minimum spanning tree; relative node tolerance; shortest path problem; statistical analysis; Biological system modeling; Computational biology; Computational complexity; Error correction; Evolution (biology); Genetic mutations; Hidden Markov models; Proteins; Sequences; Viterbi algorithm; Dynamic programming; Viterbi decoding algorithm; hidden Markov models (HMMs); minimum spanning tree; sensitivity analysis; shortest path;
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
DOI :
10.1109/JSTSP.2008.924382