• DocumentCode
    191009
  • Title

    How genome complexity can explain the hardness of aligning reads to genomes

  • Author

    Vinhthuy Phan ; Shanshan Gao ; Quang Tran ; Vo, Nam S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Memphis, Memphis, TN, USA
  • fYear
    2014
  • fDate
    2-4 June 2014
  • Firstpage
    1
  • Lastpage
    2
  • Abstract
    Although it is known that aligning short reads to reference genomes becomes harder if such genomes are embedded with complex repeat structures, there has been little effort to quantify this intuition. We investigated several measures of complexity, employed 10 popular short-read aligners to align a large number of diverse genomes, and found that unlike existing notions of complexity, a proposed notion of length sensitive measures correlated highly with the hardness of short-read alignment. This result enables speedy estimation of the hardness of alignment without aligning millions of reads to unknown genomes.
  • Keywords
    bioinformatics; genomics; information theory; complex repeat structures; complexity measures; genome complexity; genome read alignment; reference genomes; short read aligners; short read alignment; Accuracy; Bioinformatics; Complexity theory; Correlation; Genomics; Length measurement; Sequential analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4799-5786-6
  • Type

    conf

  • DOI
    10.1109/ICCABS.2014.6863916
  • Filename
    6863916