DocumentCode :
2039278
Title :
MiB: A comparative assembly processing pipeline
Author :
Wajid, Bilal ; Serpedin, Erchin ; Nounou, M. ; Nounou, H.
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
fYear :
2012
fDate :
2-4 Dec. 2012
Firstpage :
86
Lastpage :
89
Abstract :
This paper introduces MiB, a comparative genome assembly pipeline that uses three key steps. The first step involves choosing the best reference sequence by using the Minimum Description Length (MDL) principle. The MDL principle not only chooses the best reference sequence (model) but also fine-tunes the model for a better assembly by rectifying all the inversions and removing most of the insertions from the reference sequence. The MDL principle also identifies the set of reads that could align to the reference sequence. The second stage uses the same set of reads that did not align to the reference sequence as an input to a de-Buijn graph based algorithm that Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places (IDITAP). The last stage uses Bayesian Estimation for Comparative Assembly (BECA). BECA uses Quality (Q-) values for identifying probabilities of the base calls for every read and then exploits the Q-values to find the best alignments and the consensus sequence. Therefore, MiB, derived from the use of MDL-IDITAP-BECA aims to take the optimal reference sequence and the set of reads from the unassembled genome and transform the reference sequence into the novel genome by removing or rectifying four set of mutations: inversions and insertions using MDL, deletions using IDITAP and Single Nucleotide Polymorphisms (SNPs) using BECA. Preliminary test results of the proposed framework revealed promising results.
Keywords :
DNA; belief networks; bioinformatics; genomics; molecular biophysics; molecular configurations; polymorphism; probability; BECA; Bayesian Estimation for Comparative Assembly; IDITAP; Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places; MDL principle; MiB; Q-values; SNP; base call probability; comparative genome assembly processing pipeline; de-Buijn graph based algorithm; minimum description length principle; read sets; reference sequence model; sequence alignments; single nucleotide polymorphisms; Bayesian Estimation; Genome Assembly; Minimum Description Length; de-Bruijn Graphs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genomic Signal Processing and Statistics, (GENSIPS), 2012 IEEE International Workshop on
Conference_Location :
Washington, DC
ISSN :
2150-3001
Print_ISBN :
978-1-4673-5234-5
Type :
conf
DOI :
10.1109/GENSIPS.2012.6507733
Filename :
6507733
Link To Document :
بازگشت