• DocumentCode
    1764788
  • Title

    BM-SNP: A Bayesian Model for SNP Calling Using High Throughput Sequencing Data

  • Author

    Yanxun Xu ; Xiaofeng Zheng ; Yuan Yuan ; Estecio, Marcos R. ; Issa, Jean-Pierre ; Peng Qiu ; Yuan Ji ; Shoudan Liang

  • Author_Institution
    Div. of Stat. & Sci. Comput., Univ. of Texas at Austin, Austin, TX, USA
  • Volume
    11
  • Issue
    6
  • fYear
    2014
  • fDate
    Nov.-Dec. 1 2014
  • Firstpage
    1038
  • Lastpage
    1044
  • Abstract
    A single-nucleotide polymorphism (SNP) is a sole base change in the DNA sequence and is the most common polymorphism. Detection and annotation of SNPs are among the central topics in biomedical research as SNPs are believed to play important roles on the manifestation of phenotypic events, such as disease susceptibility. To take full advantage of the next-generation sequencing (NGS) technology, we propose a Bayesian approach, BM-SNP, to identify SNPs based on the posterior inference using NGS data. In particular, BM-SNP computes the posterior probability of nucleotide variation at each covered genomic position using the contents and frequency of the mapped short reads. The position with a high posterior probability of nucleotide variation is flagged as a potential SNP. We apply BM-SNP to two cell-line NGS data, and the results show a high ratio of overlap ( >95 percent) with the dbSNP database. Compared with MAQ, BM-SNP identifies more SNPs that are in dbSNP, with higher quality. The SNPs that are called only by BM-SNP but not in dbSNP may serve as new discoveries. The proposed BM-SNP method integrates information from multiple aspects of NGS data, and therefore achieves high detection power. BM-SNP is fast, capable of processing whole genome data at 20-fold average coverage in a short amount of time.
  • Keywords
    Bayes methods; DNA; biology computing; cellular biophysics; diseases; genomics; molecular biophysics; molecular configurations; polymorphism; BM-SNP; Bayesian model; DNA sequence; SNP calling; SNPs annotation; SNPs detection; biomedical research; cell-line NGS data; covered genomic position; dbSNP database; disease susceptibility; genome data; high posterior probability; high throughput sequencing data; mapped short reads contents; mapped short reads frequency; next-generation sequencing technology; nucleotide variation; phenotypic events; posterior inference; single-nucleotide polymorphism; Bayes methods; Bioinformatics; Computational biology; Computational modeling; DNA; Genomics; Histograms; Sequential analysis; Statistical analysis; Bayesian; Markov chain Monte Carlo (MCMC); false discovery rate (FDR); next-generation sequencing (NGS); single-nucleotide??polymorphism (SNP);
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2321407
  • Filename
    6809195