• Title of article

    Corrective classification: Learning from data imperfections with aggressive and diverse classifier ensembling

  • Author/Authors

    Yan Zhang، نويسنده , , Xingquan Zhu، نويسنده , , Xindong Wu، نويسنده , , Jeffrey P. Bond، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2011
  • Pages
    23
  • From page
    1135
  • To page
    1157
  • Abstract
    Learning from imperfect (noisy) information sources is a challenging and reality issue for many data mining applications. Common practices include data quality enhancement by applying data preprocessing techniques or employing robust learning algorithms to avoid developing overly complicated structures that overfit the noise. The essential goal is to reduce noise impact and eventually enhance the learners built from noise-corrupted data. In this paper, we propose a novel corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. C2 differs from existing classifier ensembling or robust learning algorithms in two aspects. On one hand, a set of diverse base learners of C2 constituting the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce noise impact. Being corrective, the classifier ensemble is built from data preprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons demonstrate that C2 is not only more accurate than the learner built from original noisy sources, but also more reliable than Bagging or aggressive classifier ensemble (ACE) , which are two degenerated components/variants of C2. The comparisons also indicate that C2 is more stable than Boosting and DECORATE, which are two state-of-the-art ensembling methods. For real-world imperfect information sources (i.e. noisy training and/or test data), C2 is able to deliver more accurate and reliable prediction models than its other peers can offer.
  • Keywords
    Bagging , error correction , Bootstrap sampling , Classifier ensemble , Noisy data
  • Journal title
    Information Systems
  • Serial Year
    2011
  • Journal title
    Information Systems
  • Record number

    1230235