DocumentCode
2227254
Title
Removing contamination from genomic sequences based on vector reference libraries
Author
Bagci, Caner ; Allmer, Jens
Author_Institution
Mol. Biol. & Genetics, Izmir Inst. of Technol., Izmir, Turkey
fYear
2012
fDate
19-22 April 2012
Firstpage
118
Lastpage
122
Abstract
DNA is often sequenced after being cloned into a vector since this provides the possibility for using standard primers and removes the need to develop custom primers. In this way a certain amount of vector is sequenced along with the sequence of interest. Unfortunately, occasionally these contaminating vector sequences find their way into public databases as part of submitted sequences. It has been pointed out that SeqClean, a program used to remove vector contamination from sequences, does not take into account that vectors are circular structures. A workaround has been presented before, but we were able to simplify the process and, additionally, we provide an implementation. We further applied our method to a test set of EST sequences and also analyzed the amount of contamination found in the EST sequences available on NCBI.
Keywords
DNA; bioinformatics; genomics; DNA; EST sequences; NCBI; SeqClean; circular structure; genomic sequences; public database; standard primer; vector contamination; vector reference libraries; vector sequences; Bioinformatics; Cleaning; Contamination; Databases; Libraries; Software; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Health Informatics and Bioinformatics (HIBIT), 2012 7th International Symposium on
Conference_Location
Nevsehir
Print_ISBN
978-1-4673-0879-3
Type
conf
DOI
10.1109/HIBIT.2012.6209053
Filename
6209053
Link To Document