• DocumentCode
    2350091
  • Title

    Identifying learners robust to low quality data

  • Author

    Folleco, Andres ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason ; Bullard, Lofton

  • Author_Institution
    Florida Atlantic University, Boca Raton, USA
  • fYear
    2008
  • fDate
    13-15 July 2008
  • Firstpage
    190
  • Lastpage
    195
  • Abstract
    Real world datasets commonly contain noise that is distributed in both the independent and dependent variables. Noise, which typically consists of erroneous variable values, has been shown to significantly affect the classification performance of learners. In this study, we identify learners with robust performance in the presence of low quality (noisy) measurement data. Noise was injected into five class imbalanced software engineering measurement datasets, initially relatively free of noise. The experimental factors considered included the learner used, the level of injected noise, the dataset used (each with unique properties), and the percentage of minority instances containing noise. No other related studies were found that have identified learners that are robust in the presence of low quality measurement data. Based on the results of this study, we recommend using the random forest learner for building classification models from noisy data.
  • Keywords
    Data mining; Decision trees; Machine learning; Noise level; Noise measurement; Noise robustness; Software measurement; Support vector machine classification; Support vector machines; Working environment noise; learning performance; quality of data; random forest; software measurement data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration, 2008. IRI 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV, USA
  • Print_ISBN
    978-1-4244-2659-1
  • Electronic_ISBN
    978-1-4244-2660-7
  • Type

    conf

  • DOI
    10.1109/IRI.2008.4583028
  • Filename
    4583028