• DocumentCode
    2227254
  • Title

    Removing contamination from genomic sequences based on vector reference libraries

  • Author

    Bagci, Caner ; Allmer, Jens

  • Author_Institution
    Mol. Biol. & Genetics, Izmir Inst. of Technol., Izmir, Turkey
  • fYear
    2012
  • fDate
    19-22 April 2012
  • Firstpage
    118
  • Lastpage
    122
  • Abstract
    DNA is often sequenced after being cloned into a vector since this provides the possibility for using standard primers and removes the need to develop custom primers. In this way a certain amount of vector is sequenced along with the sequence of interest. Unfortunately, occasionally these contaminating vector sequences find their way into public databases as part of submitted sequences. It has been pointed out that SeqClean, a program used to remove vector contamination from sequences, does not take into account that vectors are circular structures. A workaround has been presented before, but we were able to simplify the process and, additionally, we provide an implementation. We further applied our method to a test set of EST sequences and also analyzed the amount of contamination found in the EST sequences available on NCBI.
  • Keywords
    DNA; bioinformatics; genomics; DNA; EST sequences; NCBI; SeqClean; circular structure; genomic sequences; public database; standard primer; vector contamination; vector reference libraries; vector sequences; Bioinformatics; Cleaning; Contamination; Databases; Libraries; Software; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Health Informatics and Bioinformatics (HIBIT), 2012 7th International Symposium on
  • Conference_Location
    Nevsehir
  • Print_ISBN
    978-1-4673-0879-3
  • Type

    conf

  • DOI
    10.1109/HIBIT.2012.6209053
  • Filename
    6209053