• DocumentCode
    2737729
  • Title

    Workshop: Bioinformatics pipeline for fosmid based molecular haplotype sequencing

  • Author

    Duitama, Jorge ; Suk, Eun-Kyung ; Schulz, Sabrina ; McEwen, Gayle ; Huebsch, Thomas ; Hoehe, Margret

  • Author_Institution
    Genetic Variation, Haplotypes & Genetics of Complex Disease, Max Planck Inst. for Mol. Genetics, Berlin, Germany
  • fYear
    2011
  • fDate
    3-5 Feb. 2011
  • Firstpage
    269
  • Lastpage
    269
  • Abstract
    A new bioinformatics pipeline for fosmid based analysis was developed by extending the standard SOLiD pipeline for NGS. The experimental approach starts by sequencing pools of up to 15000 DNA molecules called fosmids. Each fosmid has an average length of 40kb and is sampled at random from the genome. The pipeline includes an algorithm for fosmids detection which clusters SOLiD reads aligned to the reference genome based on a custom made set of proximity rules. It also includes a module to make homozygous allele calling on regions identified as potential fosmid locations. These allele calls are collected in a matrix for single individual haplotyping. The pipeline includes a new algorithm for this bioinformatics problem which tries to find the cut of fosmids consistent with their haplotype origin. The algorithm reduces the problem to the well known NP-Complete problem called Max-CUT which was approximately solved by combining well known heuristics. Finally, the algorithm calculates the consensus haplotypes assuming that the cut is correct. After running the pipeline on 48 different pools, 32347 SNPs in 102 blocks on chromosome 22 of an individual with a predicted switch error rate of about 1% were phased.
  • Keywords
    DNA; bioinformatics; cellular biophysics; computational complexity; genomics; molecular biophysics; molecular configurations; optimisation; polymorphism; DNA molecules; NP-complete problem; SNP; SOLiD pipeline; bioinformatics; chromosome; fosmid; genome; homozygous allele; max-CUT; molecular haplotype sequencing; proximity rules; switch error; Approximation algorithms; Bioinformatics; Biological cells; DNA; Genomics; Pipelines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
  • Conference_Location
    Orlando, FL
  • Print_ISBN
    978-1-61284-851-8
  • Type

    conf

  • DOI
    10.1109/ICCABS.2011.5729923
  • Filename
    5729923