DocumentCode
3264740
Title
Segment and Combine Approach for Biological Sequence Classification
Author
Geurts, Pierre ; Cuesta, Antia Blanco ; Wehenkel, Louis
Author_Institution
Department of Electrical Engineering and Computer Science CBIG - Center of Biomedical Integrative Genoproteomics University of Li` ege, Belgium, Email: p.geurts@ulg.ac.be
fYear
2005
fDate
14-15 Nov. 2005
Firstpage
1
Lastpage
8
Abstract
This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample of training subsequences. This generic approach is combined with decision tree based ensemble methods, scalable both with respect to sample size and vocabulary size. The method is applied to three families of problems: DNA sequence recognition, splice junction detection, and gene regulon prediction. With respect to standard approaches based on n-grams, it appears competitive in terms of accuracy, flexibility, and scalability. The paper also highlights the possibility to exploit the resulting models to identify interpretable patterns specific of a given class of biological sequences.
Keywords
Bioinformatics; Biological system modeling; DNA; Decision trees; Machine learning; Machine learning algorithms; Scalability; Sequences; Support vector machines; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on
Print_ISBN
0-7803-9387-2
Type
conf
DOI
10.1109/CIBCB.2005.1594917
Filename
1594917
Link To Document