Title :
Clustering massive categorical data with class association rules
Author :
Berrado, Abdelaziz ; Runger, George
Author_Institution :
Al Akhawayn Univ.
Abstract :
Clustering algorithms partition data sets into groups of objects such that the pairwise similarity between objects within the same cluster is higher than those assigned to different clusters. Defining a similarity measure becomes challenging in the presence of categorical data and affects the quality and meaningfulness of the clusters formed. Furthermore, the curse of dimensionality diminishes the robustness of such measures. This paper introduces SCAR (supervised clustering with association rules) a nontraditional algorithm for clustering massive high dimensional categorical data. SCAR is robust to the curse of dimensionality, it relies on association rules as an intuitive way to evaluate the similarity between objects and group them.
Keywords :
data mining; pattern clustering; SCAR; class association rules; clustering algorithms; clustering massive categorical data; supervised clustering; Association rules; Clustering algorithms; Clustering methods; Entropy; Euclidean distance; Mutual information; Partitioning algorithms; Robustness; Supervised learning; Topology;
Conference_Titel :
Innovations in Information Technology, 2008. IIT 2008. International Conference on
Conference_Location :
Al Ain
Print_ISBN :
978-1-4244-3396-4
Electronic_ISBN :
978-1-4244-3397-1
DOI :
10.1109/INNOVATIONS.2008.4781693