• DocumentCode
    20667
  • Title

    A New Progressive Algorithm for a Multiple Longest Common Subsequences Problem and Its Efficient Parallelization

  • Author

    Yang, Jiaoyun ; Xu, Yun ; Sun, Guangzhong ; Shang, Yi

  • Author_Institution
    University of Science and Technology of China, Hefei
  • Volume
    24
  • Issue
    5
  • fYear
    2013
  • fDate
    May-13
  • Firstpage
    862
  • Lastpage
    870
  • Abstract
    The multiple longest common subsequence (MLCS) problem, which is related to the measurement of sequence similarity, is one of the fundamental problems in many fields. As an NP-hard problem, finding a good approximate solution within a reasonable time is important for solving large-size problems in practice. In this paper, we present a new progressive algorithm, Pro-MLCS, based on the dominant point approach. Pro-MLCS can find an approximate solution quickly and then progressively generate better solutions until obtaining the optimal one. Pro-MLCS employs three new techniques: 1) a new heuristic function for prioritizing candidate points; 2) a novel $(d)$-index-tree data structure for efficient computation of dominant points; and 3) a new pruning method using an upper bound function and approximate solutions. Experimental results show that Pro-MLCS can obtain the first approximate solution almost instantly and needs only a very small fraction, e.g., 3 percent, of the entire running time to get the optimal solution. Compared to existing state-of-the-art algorithms, Pro-MLCS can find better solutions in much shorter time, one to two orders of magnitude faster. In addition, two parallel versions of Pro-MLCS are developed: DPro-MLCS for distributed memory architecture and DSDPro-MLCS for hierarchical distributed shared memory architecture. Both parallel algorithms can efficiently utilize parallel computing resources and achieve nearly linear speedups. They also have a desirable progressiveness property—finding better solutions in shorter time when given more hardware resources.
  • Keywords
    Approximation algorithms; Complexity theory; DNA; Data structures; Heuristic algorithms; Memory architecture; Parallel algorithms; Multiple longest common subsequence problem (MLCS); SMP cluster; branch-and-bound search; distributed memory architecture; progressive algorithm; skyline problem;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2012.202
  • Filename
    6226388