• DocumentCode
    113941
  • Title

    An empirical study of filter-based feature selection algorithms using noisy training data

  • Author

    Weiwei Yuan ; Donghai Guan ; Linshan Shen ; Haiwei Pan

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
  • fYear
    2014
  • fDate
    26-28 April 2014
  • Firstpage
    209
  • Lastpage
    212
  • Abstract
    In this research, we empirically evaluate the performance of filter based feature selection using noisy data containing mislabeled samples. Mislabeled data are present in many real applications, but existing studies have not explored their influence on feature selection. We tested six well-known filter feature selection methods using datasets with pre-defined mislabeled ratios. Our results show that in most cases, feature selection performance degrades with increasing mislabeled ratios. We also evaluate the effects of mislabeled data on small size data feature selection and outline the more serious negative effects of mislabeled data. The results of this study suggest that most feature selection methods are not robust enough for noisy data containing mislabeled samples. Therefore, proper processing of noisy data before feature selection should be considered.
  • Keywords
    data handling; learning (artificial intelligence); data feature selection; filter-based feature selection algorithms; mislabeled data; mislabeled ratio; noisy data processing; noisy training data; Accuracy; Filtering algorithms; Noise; Noise measurement; Training; Training data; feature selection; mislabeled data; small size data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Technology (ICIST), 2014 4th IEEE International Conference on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/ICIST.2014.6920367
  • Filename
    6920367