Title of article

Corrective classification: Learning from data imperfections with aggressive and diverse classifier ensembling

Author/Authors

Yan Zhang، نويسنده , , Xingquan Zhu، نويسنده , , Xindong Wu، نويسنده , , Jeffrey P. Bond، نويسنده ,

Issue Information

روزنامه با شماره پیاپی سال 2011

Pages

From page

1135

To page

1157

Abstract

Learning from imperfect (noisy) information sources is a challenging and reality issue for many data mining applications. Common practices include data quality enhancement by applying data preprocessing techniques or employing robust learning algorithms to avoid developing overly complicated structures that overfit the noise. The essential goal is to reduce noise impact and eventually enhance the learners built from noise-corrupted data. In this paper, we propose a novel corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. C2 differs from existing classifier ensembling or robust learning algorithms in two aspects. On one hand, a set of diverse base learners of C2 constituting the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce noise impact. Being corrective, the classifier ensemble is built from data preprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons demonstrate that C2 is not only more accurate than the learner built from original noisy sources, but also more reliable than Bagging or aggressive classifier ensemble (ACE) , which are two degenerated components/variants of C2. The comparisons also indicate that C2 is more stable than Boosting and DECORATE, which are two state-of-the-art ensembling methods. For real-world imperfect information sources (i.e. noisy training and/or test data), C2 is able to deliver more accurate and reliable prediction models than its other peers can offer.

Keywords

Bagging , error correction , Bootstrap sampling , Classifier ensemble , Noisy data

Journal title

Information Systems

Serial Year

2011

Journal title

Information Systems

Record number

1230235

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=1230235