• DocumentCode
    2573375
  • Title

    Weighted Rough Clustering on categorical data

  • Author

    Fu, Jian ; Yin, Jian

  • Author_Institution
    Sch. of Inf. Sci. & Technol., SUN YAT-SEN Univ., Guangzhou, China
  • fYear
    2011
  • fDate
    27-29 June 2011
  • Firstpage
    939
  • Lastpage
    944
  • Abstract
    Clustering is an unsupervised machine learning framework which is attracted much attention recently. Current clustering algorithms mainly focus on samples with real-value attributes, while there is little work on samples represented (partly) by categorical attributes. The difficulty of processing categorical attributes is that the similarity between such samples can´t be evaluated by Euclidean distance directly, as much real-value based methods do. We try to tackle this problem by adopting rough set theory. Rough similarity is used to define similarity between samples. Each attribute is assigned a weight to indicate its importance for clustering and an adaptive update process based on information gain is performed to find optimal solution of both weights and clusters. The benefit of the proposed method is: it can deal with categorical data naturally; it is not sensitive to input sequence of samples to be clustered; it optimizes both importance of attributes and number of clusters simultaneously. Experiments on UCI benchmark data set show the effectiveness with comparison to some previous famous methods.
  • Keywords
    pattern clustering; pattern matching; rough set theory; unsupervised learning; Euclidean distance; UCI benchmark data; categorical data; information gain; real value attribute; rough set theory; unsupervised machine learning; weighted rough data clustering; Accuracy; Algorithm design and analysis; Clustering algorithms; Euclidean distance; Rocks; Set theory; categorical data; clustering; rough set; rough similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Service System (CSSS), 2011 International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4244-9762-1
  • Type

    conf

  • DOI
    10.1109/CSSS.2011.5972099
  • Filename
    5972099