DocumentCode
20667
Title
A New Progressive Algorithm for a Multiple Longest Common Subsequences Problem and Its Efficient Parallelization
Author
Yang, Jiaoyun ; Xu, Yun ; Sun, Guangzhong ; Shang, Yi
Author_Institution
University of Science and Technology of China, Hefei
Volume
24
Issue
5
fYear
2013
fDate
May-13
Firstpage
862
Lastpage
870
Abstract
The multiple longest common subsequence (MLCS) problem, which is related to the measurement of sequence similarity, is one of the fundamental problems in many fields. As an NP-hard problem, finding a good approximate solution within a reasonable time is important for solving large-size problems in practice. In this paper, we present a new progressive algorithm, Pro-MLCS, based on the dominant point approach. Pro-MLCS can find an approximate solution quickly and then progressively generate better solutions until obtaining the optimal one. Pro-MLCS employs three new techniques: 1) a new heuristic function for prioritizing candidate points; 2) a novel $(d)$-index-tree data structure for efficient computation of dominant points; and 3) a new pruning method using an upper bound function and approximate solutions. Experimental results show that Pro-MLCS can obtain the first approximate solution almost instantly and needs only a very small fraction, e.g., 3 percent, of the entire running time to get the optimal solution. Compared to existing state-of-the-art algorithms, Pro-MLCS can find better solutions in much shorter time, one to two orders of magnitude faster. In addition, two parallel versions of Pro-MLCS are developed: DPro-MLCS for distributed memory architecture and DSDPro-MLCS for hierarchical distributed shared memory architecture. Both parallel algorithms can efficiently utilize parallel computing resources and achieve nearly linear speedups. They also have a desirable progressiveness property—finding better solutions in shorter time when given more hardware resources.
Keywords
Approximation algorithms; Complexity theory; DNA; Data structures; Heuristic algorithms; Memory architecture; Parallel algorithms; Multiple longest common subsequence problem (MLCS); SMP cluster; branch-and-bound search; distributed memory architecture; progressive algorithm; skyline problem;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2012.202
Filename
6226388
Link To Document