DocumentCode
479545
Title
High dimensional sparse data Clustering Algorithm Based on Concept Feature Vector (CABOCFV)
Author
Wu, Sen ; Gu, Shujuan ; Gao, Xuedong
Author_Institution
Sch. of Econ. & Manage., Univ. of Sci. & Technol., Beijing
Volume
1
fYear
2008
fDate
12-15 Oct. 2008
Firstpage
202
Lastpage
206
Abstract
Finding clusters of data objects in high dimensional space is challenging, especially considering that such data can be sparse and highly skewed. This paper focuses on using concept lattice to solve high dimensional sparse data clustering problem. Concept Lattice Theory is an effective tool for data analysis and knowledge processing, which integrates the concept intent (attribute) and concept extent (object), and describes the hierarchical relationship of concept nodes. The construction of concept lattice itself is a process of concept clustering, but it produces a huge number of concept nodes due to its own completeness. Whereas we are not interested in the concept nodes whose extent is too large or too small. This paper proposes an effective high dimensional sparse data clustering algorithm based on concept feature vector (CABOCFV), which reduces the redundancy of concept construction using concept sparse feature distance and concept feature vector, and raises an effective noise recognition strategy. CABOCFV clustering algorithm is not susceptible to the input order of data objects, and scans the database only once. Experiments show that CABOCFV is effective and efficient for high dimensional sparse data clustering.
Keywords
data analysis; data mining; pattern clustering; vectors; concept extent; concept feature vector; concept intent; concept lattice; concept sparse feature distance; data analysis; data mining; data object cluster; high dimensional sparse data clustering algorithm; knowledge processing; Clustering algorithms; Computational complexity; Data analysis; Discrete wavelet transforms; Lattices; Noise reduction; Space technology; Spatial databases; Technology management; Vectors; Clustering Analysis; Concept Lattice Construction; High Dimensional Data;
fLanguage
English
Publisher
ieee
Conference_Titel
Service Operations and Logistics, and Informatics, 2008. IEEE/SOLI 2008. IEEE International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-2012-4
Electronic_ISBN
978-1-4244-2013-1
Type
conf
DOI
10.1109/SOLI.2008.4686391
Filename
4686391
Link To Document