• DocumentCode
    479545
  • Title

    High dimensional sparse data Clustering Algorithm Based on Concept Feature Vector (CABOCFV)

  • Author

    Wu, Sen ; Gu, Shujuan ; Gao, Xuedong

  • Author_Institution
    Sch. of Econ. & Manage., Univ. of Sci. & Technol., Beijing
  • Volume
    1
  • fYear
    2008
  • fDate
    12-15 Oct. 2008
  • Firstpage
    202
  • Lastpage
    206
  • Abstract
    Finding clusters of data objects in high dimensional space is challenging, especially considering that such data can be sparse and highly skewed. This paper focuses on using concept lattice to solve high dimensional sparse data clustering problem. Concept Lattice Theory is an effective tool for data analysis and knowledge processing, which integrates the concept intent (attribute) and concept extent (object), and describes the hierarchical relationship of concept nodes. The construction of concept lattice itself is a process of concept clustering, but it produces a huge number of concept nodes due to its own completeness. Whereas we are not interested in the concept nodes whose extent is too large or too small. This paper proposes an effective high dimensional sparse data clustering algorithm based on concept feature vector (CABOCFV), which reduces the redundancy of concept construction using concept sparse feature distance and concept feature vector, and raises an effective noise recognition strategy. CABOCFV clustering algorithm is not susceptible to the input order of data objects, and scans the database only once. Experiments show that CABOCFV is effective and efficient for high dimensional sparse data clustering.
  • Keywords
    data analysis; data mining; pattern clustering; vectors; concept extent; concept feature vector; concept intent; concept lattice; concept sparse feature distance; data analysis; data mining; data object cluster; high dimensional sparse data clustering algorithm; knowledge processing; Clustering algorithms; Computational complexity; Data analysis; Discrete wavelet transforms; Lattices; Noise reduction; Space technology; Spatial databases; Technology management; Vectors; Clustering Analysis; Concept Lattice Construction; High Dimensional Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Service Operations and Logistics, and Informatics, 2008. IEEE/SOLI 2008. IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-2012-4
  • Electronic_ISBN
    978-1-4244-2013-1
  • Type

    conf

  • DOI
    10.1109/SOLI.2008.4686391
  • Filename
    4686391