Training data selection based on fuzzy c-means

Author

Guan, Donghai ; Yuan, Weiwei ; Lee, Young Koo ; Lee, Sungyoung

Author_Institution

Dept. of Comput. Eng., Kyung Hee Univ., Seoul

fYear

2008

fDate

1-6 June 2008

Firstpage

761

Lastpage

765

Abstract

The performance of supervised learning could be improved when valuable data are selected for training. In this paper, we proposed three data selection methods based on fuzzy C-means algorithm. They are: center-based selection, border-based selection and bin-based selection. In center-based selection, the data with high degree of membership in each cluster are selected for training. In border-based selection, the data around the borders between clusters are selected. In bin-based selection, the data in each cluster are sorted based on their membership degrees. Then for each cluster, the sorted data are divided into bins. Finally, there is one data selected from each bin for training. The effects of them are empirically studied on a set of UCI data sets. Experimental results indicate that bin-based selection could effectively improve the performance of learning compared to randomly selecting training samples.

Keywords

fuzzy set theory; learning (artificial intelligence); pattern clustering; bin-based selection; border-based selection; center-based selection; fuzzy C-means; randomly selecting training samples; supervised learning; training data selection; Fuzzy systems; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence). IEEE International Conference on

Conference_Location

Hong Kong

ISSN

1098-7584

Print_ISBN

978-1-4244-1818-3

Electronic_ISBN

1098-7584

Type

conf

DOI

10.1109/FUZZY.2008.4630456

Filename

4630456