Title :
Feature Selection for Density-Based Clustering
Author :
Ling, Yun ; Ye, Chongyi
Author_Institution :
Coll. of Inf., Zhejiang Gongshang Univ., Hangzhou, China
Abstract :
In recent years, the advent of high throughput data generation techniques have increased not only the number of objects collected in databases, but also the number of attributes describing these objects. Clustering is the process of grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters. Dissimilarities are assessed based on the attribute values describing the objects. Real data are noisy due to measurement technology limitation and experimental variability which prohibits cluster models from revealing true clusters corrupted by noise. In this paper, we utilize correspondence analysis algorithm to process feature selection and then make use of density-based approach for clustering. We find that utilizing the two methods synthetically is very significative to solve actual problems. Experiments on synthetic and real world data demonstrate the efficiency and effectiveness of our algorithm.
Keywords :
data analysis; database management systems; pattern clustering; attribute value; correspondence analysis algorithm; data analysis; database management system; density-based clustering; experimental variability; feature selection; high throughput data generation technique; measurement technology limitation; Clustering algorithms; Data analysis; Deductive databases; Educational institutions; Electronic mail; Noise measurement; Principal component analysis; Spatial databases; Throughput; Ubiquitous computing; clustering; correspondence analysis; density-based approach; feature selection; relationship;
Conference_Titel :
Intelligent Ubiquitous Computing and Education, 2009 International Symposium on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3619-4
DOI :
10.1109/IUCE.2009.56