• DocumentCode
    2319728
  • Title

    An efficient machine learning approach to low-complexity filtering in biological sequences

  • Author

    Barber, Christopher A. ; Oehmen, Christopher S.

  • Author_Institution
    Pacific Northwest Nat. Lab., Richland, WA, USA
  • fYear
    2012
  • fDate
    9-12 May 2012
  • Firstpage
    237
  • Lastpage
    243
  • Abstract
    Biological sequences contain low-complexity regions (LCRs) which produce superfluous matches in homology searches, and lead to slow execution of database search algorithms such as BLAST. These regions are efficiently identified by low-complexity filtering algorithms such as SDUST and SEG, which are included in the BLAST tool-suite. These algorithms target differing notions of complexity, so an algorithm which combines their sensitivities is pursued. A variety of features are derived from these algorithms, as well as a new filtering algorithm based on Lempel-Ziv complexity. Artificial sequences with known LCRs are used to train and evaluate an SVM classifier, which significantly outperforms the standalone filtering algorithms.
  • Keywords
    bioinformatics; biological techniques; learning (artificial intelligence); molecular biophysics; search problems; support vector machines; Lempel-Ziv complexity based filtering algorithm; SDUST; SEG; SVM classifier; biological sequences; database search algorithms; homology searches; low complexity filtering algorithms; low complexity regions; machine learning approach; superfluous matches; support vector machine; Accuracy; Complexity theory; DNA; Entropy; Markov processes; Proteins; Support vector machines; bioinformatics; complexity measures; filtering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2012 IEEE Symposium on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-1-4673-1190-8
  • Type

    conf

  • DOI
    10.1109/CIBCB.2012.6217236
  • Filename
    6217236