• DocumentCode
    2904377
  • Title

    Training data selection based on fuzzy c-means

  • Author

    Guan, Donghai ; Yuan, Weiwei ; Lee, Young Koo ; Lee, Sungyoung

  • Author_Institution
    Dept. of Comput. Eng., Kyung Hee Univ., Seoul
  • fYear
    2008
  • fDate
    1-6 June 2008
  • Firstpage
    761
  • Lastpage
    765
  • Abstract
    The performance of supervised learning could be improved when valuable data are selected for training. In this paper, we proposed three data selection methods based on fuzzy C-means algorithm. They are: center-based selection, border-based selection and bin-based selection. In center-based selection, the data with high degree of membership in each cluster are selected for training. In border-based selection, the data around the borders between clusters are selected. In bin-based selection, the data in each cluster are sorted based on their membership degrees. Then for each cluster, the sorted data are divided into bins. Finally, there is one data selected from each bin for training. The effects of them are empirically studied on a set of UCI data sets. Experimental results indicate that bin-based selection could effectively improve the performance of learning compared to randomly selecting training samples.
  • Keywords
    fuzzy set theory; learning (artificial intelligence); pattern clustering; bin-based selection; border-based selection; center-based selection; fuzzy C-means; randomly selecting training samples; supervised learning; training data selection; Fuzzy systems; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence). IEEE International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1098-7584
  • Print_ISBN
    978-1-4244-1818-3
  • Electronic_ISBN
    1098-7584
  • Type

    conf

  • DOI
    10.1109/FUZZY.2008.4630456
  • Filename
    4630456