DocumentCode :
191009
Title :
How genome complexity can explain the hardness of aligning reads to genomes
Author :
Vinhthuy Phan ; Shanshan Gao ; Quang Tran ; Vo, Nam S.
Author_Institution :
Dept. of Comput. Sci., Univ. of Memphis, Memphis, TN, USA
fYear :
2014
fDate :
2-4 June 2014
Firstpage :
1
Lastpage :
2
Abstract :
Although it is known that aligning short reads to reference genomes becomes harder if such genomes are embedded with complex repeat structures, there has been little effort to quantify this intuition. We investigated several measures of complexity, employed 10 popular short-read aligners to align a large number of diverse genomes, and found that unlike existing notions of complexity, a proposed notion of length sensitive measures correlated highly with the hardness of short-read alignment. This result enables speedy estimation of the hardness of alignment without aligning millions of reads to unknown genomes.
Keywords :
bioinformatics; genomics; information theory; complex repeat structures; complexity measures; genome complexity; genome read alignment; reference genomes; short read aligners; short read alignment; Accuracy; Bioinformatics; Complexity theory; Correlation; Genomics; Length measurement; Sequential analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4799-5786-6
Type :
conf
DOI :
10.1109/ICCABS.2014.6863916
Filename :
6863916
Link To Document :
بازگشت