Bagging is a small-data-set phenomenon

Author

Chawla, N. ; Moore, T.E., Jr. ; Bowyer, K.W. ; Hall, L.O. ; Springer, C. ; Kegelmeyer, P.

Author_Institution

Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA

Volume

2

fYear

2001

fDate

8-14 Dec. 2001

Abstract

Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, disjoint partitions result in better performance than bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve the use of datasets that are too large to handle in the memory of a typical computer. Our results indicate that, in such applications, the simple approach of creating a committee of classifiers from disjoint partitions is preferred over the more complex approach of bagging.

Keywords

data mining; learning (artificial intelligence); pattern classification; bagging; bootstrap aggregation; classifier committee; disjoint partitions; protein structure prediction; small dataset; training data pool; training sets; Aggregates; Application software; Bagging; Computer science; Data mining; Laboratories; Proteins; Sampling methods; Testing; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on

Conference_Location

Kauai, HI, USA

ISSN

1063-6919

Print_ISBN

0-7695-1272-0

Type

conf

DOI

10.1109/CVPR.2001.991030

Filename

991030