• DocumentCode
    3714361
  • Title

    A new algorithm for “the LCS problem” with application in compressing genome resequencing data

  • Author

    Richard Beal;Tazin Afrin;Aliya Farheen;Don Adjeroh

  • Author_Institution
    Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, US
  • fYear
    2015
  • Firstpage
    69
  • Lastpage
    74
  • Abstract
    The longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data. First, we present a new algorithm for the LCS problem. Then, we introduce an LCS-motivated reference-based compression scheme using the components of the LCS, rather than the LCS itself. For the Homo sapiens genome (original size 3,080,436,051 bytes), our proposed scheme compressed the genome to 5,267,656 bytes. This can be compared with the previous best results of 19,666,791 bytes (Wang and Zhang, 2011) and 17,971,030 bytes (Pinho, Pratas, and Garcia, 2011). Thus, our compression ratio is about 3.73 to 3.41 times better than those from the state-of-the-art reference-based compression algorithms.
  • Keywords
    "Lead","Genomics","Bioinformatics","Yttrium"
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BIBM.2015.7359657
  • Filename
    7359657