• DocumentCode
    796888
  • Title

    SpeedHap: An Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage

  • Author

    Genovese, Loredana M. ; Geraci, Filippo ; Pelleg, Marco

  • Author_Institution
    Inst. for Inf. & Telematics, Italian Nat. Res. Council, Pisa
  • Volume
    5
  • Issue
    4
  • fYear
    2008
  • Firstpage
    492
  • Lastpage
    502
  • Abstract
    Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNP´s present in a chromosome (called the haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. When gaps are considered no exact method of polynomial complexity is known. The problem is also hard to approximate with guarantees. Therefore fast heuristics have been proposed. In this paper we describe SpeedHap, a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test SpeedHap on real data from the HapMap Project.
  • Keywords
    DNA; biology computing; cellular biophysics; computational complexity; molecular biophysics; molecular configurations; DNA variation; HapMap Project; SpeedHap; haplotype reconstruction; heuristic method; homologous human chromosomes; polynomial complexity; reading error rate; single individual SNP haplotyping problem; single nucleotide polymorphism; Algorithms; Biology and genetics; Algorithms; Base Sequence; Chromosome Mapping; DNA Mutational Analysis; Haplotypes; Humans; Molecular Sequence Data; Polymorphism, Single Nucleotide; Reproducibility of Results; Sensitivity and Specificity; Sequence Analysis, DNA; Software;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2008.67
  • Filename
    4564438