DocumentCode
2904377
Title
Training data selection based on fuzzy c-means
Author
Guan, Donghai ; Yuan, Weiwei ; Lee, Young Koo ; Lee, Sungyoung
Author_Institution
Dept. of Comput. Eng., Kyung Hee Univ., Seoul
fYear
2008
fDate
1-6 June 2008
Firstpage
761
Lastpage
765
Abstract
The performance of supervised learning could be improved when valuable data are selected for training. In this paper, we proposed three data selection methods based on fuzzy C-means algorithm. They are: center-based selection, border-based selection and bin-based selection. In center-based selection, the data with high degree of membership in each cluster are selected for training. In border-based selection, the data around the borders between clusters are selected. In bin-based selection, the data in each cluster are sorted based on their membership degrees. Then for each cluster, the sorted data are divided into bins. Finally, there is one data selected from each bin for training. The effects of them are empirically studied on a set of UCI data sets. Experimental results indicate that bin-based selection could effectively improve the performance of learning compared to randomly selecting training samples.
Keywords
fuzzy set theory; learning (artificial intelligence); pattern clustering; bin-based selection; border-based selection; center-based selection; fuzzy C-means; randomly selecting training samples; supervised learning; training data selection; Fuzzy systems; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence). IEEE International Conference on
Conference_Location
Hong Kong
ISSN
1098-7584
Print_ISBN
978-1-4244-1818-3
Electronic_ISBN
1098-7584
Type
conf
DOI
10.1109/FUZZY.2008.4630456
Filename
4630456
Link To Document