DocumentCode
2737729
Title
Workshop: Bioinformatics pipeline for fosmid based molecular haplotype sequencing
Author
Duitama, Jorge ; Suk, Eun-Kyung ; Schulz, Sabrina ; McEwen, Gayle ; Huebsch, Thomas ; Hoehe, Margret
Author_Institution
Genetic Variation, Haplotypes & Genetics of Complex Disease, Max Planck Inst. for Mol. Genetics, Berlin, Germany
fYear
2011
fDate
3-5 Feb. 2011
Firstpage
269
Lastpage
269
Abstract
A new bioinformatics pipeline for fosmid based analysis was developed by extending the standard SOLiD pipeline for NGS. The experimental approach starts by sequencing pools of up to 15000 DNA molecules called fosmids. Each fosmid has an average length of 40kb and is sampled at random from the genome. The pipeline includes an algorithm for fosmids detection which clusters SOLiD reads aligned to the reference genome based on a custom made set of proximity rules. It also includes a module to make homozygous allele calling on regions identified as potential fosmid locations. These allele calls are collected in a matrix for single individual haplotyping. The pipeline includes a new algorithm for this bioinformatics problem which tries to find the cut of fosmids consistent with their haplotype origin. The algorithm reduces the problem to the well known NP-Complete problem called Max-CUT which was approximately solved by combining well known heuristics. Finally, the algorithm calculates the consensus haplotypes assuming that the cut is correct. After running the pipeline on 48 different pools, 32347 SNPs in 102 blocks on chromosome 22 of an individual with a predicted switch error rate of about 1% were phased.
Keywords
DNA; bioinformatics; cellular biophysics; computational complexity; genomics; molecular biophysics; molecular configurations; optimisation; polymorphism; DNA molecules; NP-complete problem; SNP; SOLiD pipeline; bioinformatics; chromosome; fosmid; genome; homozygous allele; max-CUT; molecular haplotype sequencing; proximity rules; switch error; Approximation algorithms; Bioinformatics; Biological cells; DNA; Genomics; Pipelines;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
Conference_Location
Orlando, FL
Print_ISBN
978-1-61284-851-8
Type
conf
DOI
10.1109/ICCABS.2011.5729923
Filename
5729923
Link To Document