DocumentCode
113941
Title
An empirical study of filter-based feature selection algorithms using noisy training data
Author
Weiwei Yuan ; Donghai Guan ; Linshan Shen ; Haiwei Pan
Author_Institution
Dept. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
fYear
2014
fDate
26-28 April 2014
Firstpage
209
Lastpage
212
Abstract
In this research, we empirically evaluate the performance of filter based feature selection using noisy data containing mislabeled samples. Mislabeled data are present in many real applications, but existing studies have not explored their influence on feature selection. We tested six well-known filter feature selection methods using datasets with pre-defined mislabeled ratios. Our results show that in most cases, feature selection performance degrades with increasing mislabeled ratios. We also evaluate the effects of mislabeled data on small size data feature selection and outline the more serious negative effects of mislabeled data. The results of this study suggest that most feature selection methods are not robust enough for noisy data containing mislabeled samples. Therefore, proper processing of noisy data before feature selection should be considered.
Keywords
data handling; learning (artificial intelligence); data feature selection; filter-based feature selection algorithms; mislabeled data; mislabeled ratio; noisy data processing; noisy training data; Accuracy; Filtering algorithms; Noise; Noise measurement; Training; Training data; feature selection; mislabeled data; small size data;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science and Technology (ICIST), 2014 4th IEEE International Conference on
Conference_Location
Shenzhen
Type
conf
DOI
10.1109/ICIST.2014.6920367
Filename
6920367
Link To Document