DocumentCode :
1780538
Title :
Highly correlated feature set selection for data clustering
Author :
Sumalatha, M.R. ; Ananthi, M. ; Arvind, A. ; Navin, N. ; Siddarth, C.
fYear :
2014
fDate :
10-12 April 2014
Firstpage :
1
Lastpage :
4
Abstract :
Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.
Keywords :
correlation methods; feature selection; pattern classification; pattern clustering; trees (mathematics); FPA technique; HCFS algorithmis; MST; SU values; classification performance; cluster representatives; data clustering process; feature subset identification; frequent pattern analysis; highly correlated feature set selection; minimum spanning tree; query evaluation process; redundancy factors; relevancy factors; spanning tree construction process; symmetric uncertainty values; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Entropy; Filtering algorithms; Partitioning algorithms; Uncertainty; Entropy; Frequent Pattern Analysis; Information gain; Minimum Spanning Tree; Symmetric Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information Technology (ICRTIT), 2014 International Conference on
Conference_Location :
Chennai
Type :
conf
DOI :
10.1109/ICRTIT.2014.6996215
Filename :
6996215
Link To Document :
بازگشت