DocumentCode
3714361
Title
A new algorithm for “the LCS problem” with application in compressing genome resequencing data
Author
Richard Beal;Tazin Afrin;Aliya Farheen;Don Adjeroh
Author_Institution
Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, US
fYear
2015
Firstpage
69
Lastpage
74
Abstract
The longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data. First, we present a new algorithm for the LCS problem. Then, we introduce an LCS-motivated reference-based compression scheme using the components of the LCS, rather than the LCS itself. For the Homo sapiens genome (original size 3,080,436,051 bytes), our proposed scheme compressed the genome to 5,267,656 bytes. This can be compared with the previous best results of 19,666,791 bytes (Wang and Zhang, 2011) and 17,971,030 bytes (Pinho, Pratas, and Garcia, 2011). Thus, our compression ratio is about 3.73 to 3.41 times better than those from the state-of-the-art reference-based compression algorithms.
Keywords
"Lead","Genomics","Bioinformatics","Yttrium"
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BIBM.2015.7359657
Filename
7359657
Link To Document