DocumentCode :
890728
Title :
A Study of Hierarchical and Flat Classification of Proteins
Author :
Zimek, Arthur ; Buchwald, Fabian ; Frank, Eibe ; Kramer, Stefan
Author_Institution :
Inst. flier Inf., Lehrund Forschungseinheit fuer Datenbanksysteme, Ludwig-Maximilians-Univ. Muenchen, Muenchen, Germany
Volume :
7
Issue :
3
fYear :
2010
Firstpage :
563
Lastpage :
571
Abstract :
Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article, we investigate empirically whether this is the case for two such hierarchies. We compare multiclass classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multiclass settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data but not in the case of the protein classification problems. Based on this, we recommend that strong flat multiclass methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.
Keywords :
biological techniques; biology computing; enzymes; molecular biophysics; support vector machines; enzyme classification; fold recognition; homology detection data; multiclass settings; nested dichotomy; protein flat classification; protein hierarchical classification; support vector machines; Biology and genetics; Classifier design and evaluation; Data mining; Machine learning; Protein classification; Sciences; hierarchical classification; multiclass classification.; Algorithms; Artificial Intelligence; Computing Methodologies; Molecular Sequence Data; Pattern Recognition, Automated; Protein Folding; Proteins;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2008.104
Filename :
4641909
Link To Document :
بازگشت