DocumentCode :
20667
Title :
A New Progressive Algorithm for a Multiple Longest Common Subsequences Problem and Its Efficient Parallelization
Author :
Yang, Jiaoyun ; Xu, Yun ; Sun, Guangzhong ; Shang, Yi
Author_Institution :
University of Science and Technology of China, Hefei
Volume :
24
Issue :
5
fYear :
2013
fDate :
May-13
Firstpage :
862
Lastpage :
870
Abstract :
The multiple longest common subsequence (MLCS) problem, which is related to the measurement of sequence similarity, is one of the fundamental problems in many fields. As an NP-hard problem, finding a good approximate solution within a reasonable time is important for solving large-size problems in practice. In this paper, we present a new progressive algorithm, Pro-MLCS, based on the dominant point approach. Pro-MLCS can find an approximate solution quickly and then progressively generate better solutions until obtaining the optimal one. Pro-MLCS employs three new techniques: 1) a new heuristic function for prioritizing candidate points; 2) a novel $(d)$-index-tree data structure for efficient computation of dominant points; and 3) a new pruning method using an upper bound function and approximate solutions. Experimental results show that Pro-MLCS can obtain the first approximate solution almost instantly and needs only a very small fraction, e.g., 3 percent, of the entire running time to get the optimal solution. Compared to existing state-of-the-art algorithms, Pro-MLCS can find better solutions in much shorter time, one to two orders of magnitude faster. In addition, two parallel versions of Pro-MLCS are developed: DPro-MLCS for distributed memory architecture and DSDPro-MLCS for hierarchical distributed shared memory architecture. Both parallel algorithms can efficiently utilize parallel computing resources and achieve nearly linear speedups. They also have a desirable progressiveness property—finding better solutions in shorter time when given more hardware resources.
Keywords :
Approximation algorithms; Complexity theory; DNA; Data structures; Heuristic algorithms; Memory architecture; Parallel algorithms; Multiple longest common subsequence problem (MLCS); SMP cluster; branch-and-bound search; distributed memory architecture; progressive algorithm; skyline problem;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2012.202
Filename :
6226388
Link To Document :
بازگشت