DocumentCode :
2413085
Title :
Evaluation of short read metagenomic assembly
Author :
Charuvaka, Anveshi ; Rangwala, Huzefa
fYear :
2010
fDate :
18-21 Dec. 2010
Firstpage :
171
Lastpage :
178
Abstract :
Assembling short reads obtained from community samples using next-generation sequencing technologies is challenging due to several reasons. In this study we assess the performance of a state-of-the-art Eulerian-path based assembler on a series of simulated dataset with varying complexities. We evaluate the feasibility of metagenomic assembly with reads restricted to length 36 base pairs, obtained from the Solexa/Illumina platform. We developed a pipeline to evaluate the quality of assembly based on contig length statistics and accuracy. We studied the effect of overlap parameters used for the metagenomic assembly and developed a clustering solution to pool the contigs obtained from different runs of the assembly algorithm which allowed us to obtain longer contigs from different runs. We also computed an entropy/impurity metric to assess the degree of chimericity in the assembled contigs. We also compared the metagenomic assemblies to the best possible solution that could be obtained by assembling individual source genomes. Our results show that accuracy was better than expected for the metagenomic samples with a few dominant organisms and was especially poor in samples containing many closely related strains.
Keywords :
bioinformatics; data analysis; entropy; genomics; molecular biophysics; Eulerian-path based assembler; Illumina platform; Solexa platform; assembly algorithm; chimericity; clustering solution; contig length statistics; entropy metric; impurity metric; short read metagenomic assembly; simulated dataset; Accuracy; Assembly; Complexity theory; Entropy; Genomics; Organisms; Strain;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-8306-8
Electronic_ISBN :
978-1-4244-8307-5
Type :
conf
DOI :
10.1109/BIBM.2010.5706558
Filename :
5706558
Link To Document :
بازگشت