DocumentCode :
3264740
Title :
Segment and Combine Approach for Biological Sequence Classification
Author :
Geurts, Pierre ; Cuesta, Antia Blanco ; Wehenkel, Louis
Author_Institution :
Department of Electrical Engineering and Computer Science CBIG - Center of Biomedical Integrative Genoproteomics University of Li` ege, Belgium, Email: p.geurts@ulg.ac.be
fYear :
2005
fDate :
14-15 Nov. 2005
Firstpage :
1
Lastpage :
8
Abstract :
This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample of training subsequences. This generic approach is combined with decision tree based ensemble methods, scalable both with respect to sample size and vocabulary size. The method is applied to three families of problems: DNA sequence recognition, splice junction detection, and gene regulon prediction. With respect to standard approaches based on n-grams, it appears competitive in terms of accuracy, flexibility, and scalability. The paper also highlights the possibility to exploit the resulting models to identify interpretable patterns specific of a given class of biological sequences.
Keywords :
Bioinformatics; Biological system modeling; DNA; Decision trees; Machine learning; Machine learning algorithms; Scalability; Sequences; Support vector machines; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on
Print_ISBN :
0-7803-9387-2
Type :
conf
DOI :
10.1109/CIBCB.2005.1594917
Filename :
1594917
Link To Document :
بازگشت