DocumentCode
1605046
Title
Learning from soft partitions of data: reducing the variance
Author
Eschrich, Sebastian ; Hall, Lawrence O.
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Volume
1
fYear
2003
Firstpage
666
Abstract
Distributed machine learning can be realized using a divide and conquer methodology. One such divide and conquer method is learning from soft partitions of data. By examining the decomposition of classifier error into bias and variance terms, we see that learning from smaller partitions of data introduces higher variance. In this paper, we investigate the use of a particular variance reduction technique, randomized C4.5, when learning from soft partitions of data. This approach maintains the distributed nature of the learning algorithm while boosting the overall classification accuracy. Experiments on six machine learning datasets demonstrate the improved accuracy gains by reducing classifier variance. In particular, learning from soft partitions of data can produce more accurate classifiers than using an ensemble of randomized decision trees constructed from the entire dataset, which in turn results in a more accurate classifier than building a single decision tree.
Keywords
data mining; decision trees; divide and conquer methods; fuzzy set theory; learning (artificial intelligence); bias terms; classifier error decomposition; distributed machine learning; divide and conquer methodology; k-means clustering; localized bagging; randomized C4.5; soft partitions of data; variance reduction technique; variance terms; Bagging; Boosting; Classification tree analysis; Computer errors; Computer science; Decision trees; Learning systems; Machine learning; Neurons; Partitioning algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems, 2003. FUZZ '03. The 12th IEEE International Conference on
Print_ISBN
0-7803-7810-5
Type
conf
DOI
10.1109/FUZZ.2003.1209443
Filename
1209443
Link To Document