DocumentCode
191009
Title
How genome complexity can explain the hardness of aligning reads to genomes
Author
Vinhthuy Phan ; Shanshan Gao ; Quang Tran ; Vo, Nam S.
Author_Institution
Dept. of Comput. Sci., Univ. of Memphis, Memphis, TN, USA
fYear
2014
fDate
2-4 June 2014
Firstpage
1
Lastpage
2
Abstract
Although it is known that aligning short reads to reference genomes becomes harder if such genomes are embedded with complex repeat structures, there has been little effort to quantify this intuition. We investigated several measures of complexity, employed 10 popular short-read aligners to align a large number of diverse genomes, and found that unlike existing notions of complexity, a proposed notion of length sensitive measures correlated highly with the hardness of short-read alignment. This result enables speedy estimation of the hardness of alignment without aligning millions of reads to unknown genomes.
Keywords
bioinformatics; genomics; information theory; complex repeat structures; complexity measures; genome complexity; genome read alignment; reference genomes; short read aligners; short read alignment; Accuracy; Bioinformatics; Complexity theory; Correlation; Genomics; Length measurement; Sequential analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
Conference_Location
Miami, FL
Print_ISBN
978-1-4799-5786-6
Type
conf
DOI
10.1109/ICCABS.2014.6863916
Filename
6863916
Link To Document