Title :
Cluster the unlabeled datasets using Extended Dark Block Extraction
Author :
Asadi, Srinivasulu ; Obulesu, O.
Author_Institution :
IT Dept., JNTUA-Anantapur, Tirupati, India
Abstract :
Clustering analysis is the problem of partitioning a set of objects O = {o1... on} into c self-similar subsets based on available data. In general, clustering of unlabeled data possess three major problems: 1) assessing cluster tendency, i.e., how many clusters to seek? 2) Partitioning the data into c meaningful groups, and 3) validating the c clusters that are discovered. We address the first problem, i.e., determining the number of clusters c prior to clustering. Many clustering algorithms require number of clusters as an input parameter, so the quality of the clusters mainly depends on this value. Most methods are post clustering measures of cluster validity i.e., they attempt to choose the best partition from a set of alternative partitions. In contrast, tendency assessment attempts to estimate c before clustering occurs. Here, we represent the structure of the unlabeled data sets as a Reordered Dissimilarity Image (RDI), where pair wise dissimilarity information about a data set including `n´ objects is represented as nxn image. RDI is generated using VAT (Visual Assessment of Cluster tendency), RDI highlights potential clusters as a set of dark blocks along the diagonal of the image. So, number of clusters can be easily estimated using the number of dark blocks across the diagonal. We develop a new method called Extended Dark Block Extraction (EDBE) for counting the number of clusters formed along the diagonal of the RDI. EDBE method combines several image and signal processing techniques.
Keywords :
data analysis; image representation; pattern clustering; VAT algorithm; clustering analysis; data partitioning; extended dark block extraction; image diagonal; image processing; image representation; pair wise dissimilarity information; reordered dissimilarity image; signal processing; unlabeled data; visual assessment of cluster tendency; Algorithm design and analysis; Clustering algorithms; Data mining; Filtering; Image segmentation; Pixel; Visualization; C-Means Clustering; Cluster Count Extraction; Cluster tendency; Clustering; Reordered dissimilarity image; VAT;
Conference_Titel :
Communication and Computational Intelligence (INCOCCI), 2010 International Conference on
Electronic_ISBN :
978-81-8371-369-6