Title :
Optimal haplotype assembly with statistical pruning
Author :
Das, Shreepriya ; Vikalo, Haris
Author_Institution :
ECE Dept., Univ. of Texas at Austin, Austin, TX, USA
Abstract :
Solving the haplotype assembly problem by optimizing the commonly used minimum error correction criterion is known to be NP-hard. For this reason, suboptimal heuristics are often used in practice. In this paper, we propose a novel method for optimal haplotype assembly that is based on depth-first branch-and-bound search of the solution space. Our scheme is inspired by the sphere decodng algorithms used heavily in the field of digital communications. Using the statistical information about errors in sequencing data, we constrain the search of the haplotype space and speedily find the optimal solution to the haplotype assembly problem. Theoretical analysis of the expected complexity of the algorithm shows that optimal haplotype assembly is practically feasible for haplotype blocks of moderate lengths typically obtained using present day high throughput sequencers. The scheme is then tested on 1000 Genomes Project experimental data to verify the efficacy of the proposed method.
Keywords :
biology computing; computational complexity; genomics; statistical analysis; NP-hard; digital communications; genomes project experimental data; haplotype assembly problem; high throughput sequencers; minimum error correction criterion; optimal haplotype assembly; sequencing data; sphere decodng algorithms; statistical information; statistical pruning; suboptimal heuristics; Assembly; Bioinformatics; Biological cells; Complexity theory; Genomics; Sequential analysis; Signal processing algorithms;
Conference_Titel :
Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on
Conference_Location :
Atlanta, GA
DOI :
10.1109/GlobalSIP.2014.7032339