DocumentCode :
3436987
Title :
A Tolerance Graph Approach for Domain-Specific Assembly of Next Generation Sequencing Data
Author :
Warnke, Julia ; Ali, Hamza
Author_Institution :
Dept. of Comput. Sci., Univ. of Nebraska Omaha, Omaha, NE, USA
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
88
Lastpage :
95
Abstract :
Next generation sequencing (NGS) has become a major focus in many recent biological research applications. NGS produces thousands to millions of short DNA fragments in a single run. Individually, these fragments represent only a small fraction of an original biological sample. To obtain any useful information, overlapping fragments must be assembled into long stretches of contiguous sequence. Various assemblers have been developed to address the fragment assembly problem. The majority of current assemblers were developed to fill an important gap, however, they were developed with a pure computational focus without taking the properties of the input datasets into consideration. NGS dataset characteristics such as fragment coverage and underlying genome complexity vary dramatically between different sequencing applications. Generic assemblers that are data independent are unlikely to produce accurate solutions in all problem domains. In this study, we propose a graph theoretic approach based on the concept of tolerance graphs to develop a domain-specific assembler. The proposed assembler is designed to extract signals associated with local features in the input dataset and reintegrate this knowledge into the assembly process through customized tolerance graph parameters. We conducted a number of experiments to study the impact of various input parameters on the quality of the assembled genomes. Results from this study show that the proposed assembler produces excellent results and outperforms other known assembly algorithms for some input datasets. This approach also presents the foundation for developing domain-specific assemblers to be applied in an intelligent and customized manner to a wide variety of input instances, resulting in more efficient assembly tactics and improved overall assembly quality.
Keywords :
DNA; biology computing; data handling; graph theory; DNA fragments; NGS dataset characteristics; biological research applications; domain specific assembly; generic assemblers; genome complexity; graph approach tolerance; graph theoretic approach; next generation sequencing data; overlapping fragments; Assembly; Bioinformatics; Feature extraction; Genomics; Mathematical model; Next generation networking; Sequential analysis; Graph theory; Knowledge-based genome assembly; Next generation sequencing; Tolerance graph;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4799-3143-9
Type :
conf
DOI :
10.1109/ICDMW.2013.105
Filename :
6753907
Link To Document :
بازگشت