• DocumentCode
    2317440
  • Title

    High dimensional data Clustering Algorithm Based on Sparse Feature Vector for Categorical Attributes

  • Author

    Wu, Sen ; Wei, Guiying

  • Author_Institution
    Sch. of Econ. & Manage., Univ. of Sci. & Technol. Beijing, Beijing, China
  • Volume
    2
  • fYear
    2010
  • fDate
    9-10 Jan. 2010
  • Firstpage
    973
  • Lastpage
    976
  • Abstract
    An algorithm is proposed to cluster high dimensional data named as Clustering Algorithm Based On Sparse Feature Vector for Categorical Attributes (CABOSFV_C). It compresses data effectively by using `Sparse Feature Vector of a Set for Categorical Data´ without losing the information necessary for making clustering decisions, and can get the clustering result with once data scan by defining `Sparse Feature Dissimilarity of a Set for Categorical Data´ as distance measure. Because of the data reduction and once data scan strategy the algorithm has almost linear computation complexity and handles noise effectively. In addition, CABOSFV_C is suitable not only for sparse data but also for complete data, which is illustrated by two numeric examples at the end of the paper as well as other salient features of the algorithm.
  • Keywords
    computational complexity; data compression; data mining; data reduction; decision making; pattern clustering; vectors; CABOSFVC; categorical attributes; computation complexity; data compressession; data reduction; decision making; high dimensional data clustering algorithm; noise handling; sparse feature dissimilarity; sparse feature vector; Clustering algorithms; Data mining; Noise reduction; Search methods; Technology management; Virtual manufacturing; Categorical Data; Clustering; Data Mining; High Dimensionality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Logistics Systems and Intelligent Management, 2010 International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4244-7331-1
  • Type

    conf

  • DOI
    10.1109/ICLSIM.2010.5461099
  • Filename
    5461099