DocumentCode :
3496120
Title :
Efficient algorithms for sequence assembly
Author :
Rajasekaran, Sanguthevar ; Saha, Subrata
Author_Institution :
Dept. of CSE, Univ. of Connecticut, Storrs, CT, USA
fYear :
2013
fDate :
12-14 June 2013
Firstpage :
1
Lastpage :
1
Abstract :
Sequencing genomes is one of the most fundamental problems in modern biology and has immense impact on biomedical research. De novo sequencing is computationally more challenging when compared to sequencing with a reference genome. Repeats, for instance, make genome assembly extremely difficult. Locations of reads shorter than the repeat length cannot be resolved uniquely. On the other hand the existing sequencing technology is not mature enough to identify/read the entire sequence of the genome - especially for complex organisms like mammals. However small fragments of the genome can be read with acceptable accuracy. The shotgun sequencing employed in many sequencing projects breaks the genome randomly at many places and generates a large number of small fragments (called reads) of the genome. The problem of reassembling all the fragmented reads into a sequence close to the original sequence is known as the Sequence Assembly (SA) problem. Sequence assembly is complex for various reasons including short reads, errors in sequencing, the presence of repeats, etc. In this talk we survey some of the algorithms (specifically [1, 2, 3]) that have been proposed for SA. Both de Bruijn and overlap graph based algorithms will be discussed. We also summarize a recent algorithm we have come up with for the problem of scaffolding.problem.
Keywords :
biology computing; genomics; graph theory; De novo sequencing; Sequence Assembly problem; biomedical research; complex organism; de Bruijn based algorithm; genome assembly; genome sequence identification; genome sequencing; mammal; overlap graph based algorithm; reference genome; repeat length; scaffolding based algorithms; sequencing technology; shotgun sequencing; Assembly; Bioinformatics; Educational institutions; Genomics; Organisms; Sequential analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2013 IEEE 3rd International Conference on
Conference_Location :
New Orleans, LA
Type :
conf
DOI :
10.1109/ICCABS.2013.6629223
Filename :
6629223
Link To Document :
بازگشت