Title :
How genome complexity can explain the hardness of aligning reads to genomes
Author :
Vinhthuy Phan ; Shanshan Gao ; Quang Tran ; Vo, Nam S.
Author_Institution :
Dept. of Comput. Sci., Univ. of Memphis, Memphis, TN, USA
Abstract :
Although it is known that aligning short reads to reference genomes becomes harder if such genomes are embedded with complex repeat structures, there has been little effort to quantify this intuition. We investigated several measures of complexity, employed 10 popular short-read aligners to align a large number of diverse genomes, and found that unlike existing notions of complexity, a proposed notion of length sensitive measures correlated highly with the hardness of short-read alignment. This result enables speedy estimation of the hardness of alignment without aligning millions of reads to unknown genomes.
Keywords :
bioinformatics; genomics; information theory; complex repeat structures; complexity measures; genome complexity; genome read alignment; reference genomes; short read aligners; short read alignment; Accuracy; Bioinformatics; Complexity theory; Correlation; Genomics; Length measurement; Sequential analysis;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4799-5786-6
DOI :
10.1109/ICCABS.2014.6863916