DocumentCode :
1277761
Title :
A Fast Multiple Longest Common Subsequence (MLCS) Algorithm
Author :
Wang, Qingguo ; Korkin, Dmitry ; Shang, Yi
Author_Institution :
Dept. of Comput. Sci., Univ. of Missouri, Columbia, MO, USA
Volume :
23
Issue :
3
fYear :
2011
fDate :
3/1/2011 12:00:00 AM
Firstpage :
321
Lastpage :
334
Abstract :
Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special cases, the increasing complexity and size of biological data require more efficient methods applicable to an arbitrary number of strings. In this paper, we present a new algorithm for the general case of multiple LCS (or MLCS) problem, i.e., finding an LCS of any number of strings, and its parallel realization. The algorithm is based on the dominant point approach and employs a fast divide-and-conquer technique to compute the dominant points. When applied to a case of three strings, our algorithm demonstrates the same performance as the fastest existing MLCS algorithm designed for that specific case. When applied to more than three strings, our algorithm is significantly faster than the best existing sequential methods, reaching up to 2-3 orders of magnitude faster speed on large-size problems. Finally, we present an efficient parallel implementation of the algorithm. Evaluating the parallel algorithm on a benchmark set of both random and biological sequences reveals a near-linear speedup with respect to the sequential algorithm.
Keywords :
computational complexity; divide and conquer methods; parallel algorithms; sequences; MLCS; NP-hard problem; bioinformatics; biological data; computational genomics; divide-and-conquer technique; dominant point approach; multiple longest common subsequence algorithm; multiple strings; parallel algorithm; Algorithm design and analysis; Biology; Complexity theory; Dynamic programming; Heuristic algorithms; Parallel algorithms; Program processors; Longest common subsequence (LCS); divide and conquer; dominant point method; dynamic programming; multiple longest common subsequence (MLCS); multithreading.; parallel processing;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.123
Filename :
5530316
Link To Document :
بازگشت