• DocumentCode
    3264740
  • Title

    Segment and Combine Approach for Biological Sequence Classification

  • Author

    Geurts, Pierre ; Cuesta, Antia Blanco ; Wehenkel, Louis

  • Author_Institution
    Department of Electrical Engineering and Computer Science CBIG - Center of Biomedical Integrative Genoproteomics University of Li` ege, Belgium, Email: p.geurts@ulg.ac.be
  • fYear
    2005
  • fDate
    14-15 Nov. 2005
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample of training subsequences. This generic approach is combined with decision tree based ensemble methods, scalable both with respect to sample size and vocabulary size. The method is applied to three families of problems: DNA sequence recognition, splice junction detection, and gene regulon prediction. With respect to standard approaches based on n-grams, it appears competitive in terms of accuracy, flexibility, and scalability. The paper also highlights the possibility to exploit the resulting models to identify interpretable patterns specific of a given class of biological sequences.
  • Keywords
    Bioinformatics; Biological system modeling; DNA; Decision trees; Machine learning; Machine learning algorithms; Scalability; Sequences; Support vector machines; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on
  • Print_ISBN
    0-7803-9387-2
  • Type

    conf

  • DOI
    10.1109/CIBCB.2005.1594917
  • Filename
    1594917