DocumentCode
2053494
Title
Measurement of similarity using link based cluster approach for categorical data
Author
Pavithra, M. ; Chandrakala, D.
Author_Institution
Dept. of Comput. Sci. & Eng., Kumaraguru Coll. of Technol., Coimbatore, India
fYear
2013
fDate
21-22 Feb. 2013
Firstpage
507
Lastpage
516
Abstract
Clustering is to categorize data into groups or clusters such that the data in the same cluster are more similar to each other than to those in different clusters. The problem of clustering categorical data is to find a new partition in dataset to overcome the problem of clustering categorical data via cluster ensembles, result is observed that these techniques unluckily generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. This problem degrades the quality of the clustering result. To improve clustering quality a new link-based approach the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble and an efficient link-based algorithm is proposed for the underlying similarity assessment. In this paper propose C-Rank link-based algorithm improve clustering quality and ranking clusters in weighted networks. C-Rank consists of three major phases: (1) identification of candidate clusters; (2) ranking the candidates by integrated cohesion; and (3) elimination of non-maximal clusters. The finally apply this clustering result in graph partitioning technique is applied to a weighted bipartite graph that is formulated from the refined matrix.
Keywords
data analysis; data mining; graph theory; matrix algebra; pattern clustering; C-rank link-based algorithm; candidate cluster identification; categorical data clustering problem; cluster ranking; cluster-data point relations; clustering quality; data categorization; data partition; ensemble-information matrix; graph partitioning technique; link based cluster approach; nonmaximal cluster elimination; refined matrix; similarity measurement; weighted bipartite graph; Algorithm design and analysis; Clustering algorithms; Computer science; Educational institutions; Entropy; Partitioning algorithms; Robustness; C-Rank link based cluster; Categorical data; Cluster Ensemble; Clustering; Data mining; link-based similarity; refined matrix;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Communication and Embedded Systems (ICICES), 2013 International Conference on
Conference_Location
Chennai
Print_ISBN
978-1-4673-5786-9
Type
conf
DOI
10.1109/ICICES.2013.6508312
Filename
6508312
Link To Document