Title :
Fuzzy–Rough Simultaneous Attribute Selection and Feature Extraction Algorithm
Author :
Maji, Pradipta ; Garai, Partha
Author_Institution :
Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
Abstract :
Among the huge number of attributes or features present in real-life data sets, only a small fraction of them are effective to represent the data set accurately. Prior to analysis of the data set, selecting or extracting relevant and significant features is an important preprocessing step used for pattern recognition, data mining, and machine learning. In this regard, a novel dimensionality reduction method, based on fuzzy-rough sets, that simultaneously selects attributes and extracts features using the concept of feature significance is presented. The method is based on maximizing both the relevance and significance of the reduced feature set, whereby redundancy therein is removed. This paper also presents classical and neighborhood rough sets for computing the relevance and significance of the feature set and compares their performances with that of fuzzy-rough sets based on the predictive accuracy of nearest neighbor rule, support vector machine, and decision tree. An important finding is that the proposed dimensionality reduction method based on fuzzy-rough sets is shown to be more effective for generating a relevant and significant feature subset. The effectiveness of the proposed fuzzy-rough-set-based dimensionality reduction method, along with a comparison with existing attribute selection and feature extraction methods, is demonstrated on real-life data sets.
Keywords :
data analysis; data mining; data structures; decision trees; feature extraction; fuzzy set theory; learning (artificial intelligence); rough set theory; support vector machines; classical rough sets; data mining; data set analysis; data set representation; decision tree; dimensionality reduction method; feature extraction algorithm; feature selection; feature significance concept; feature subset; fuzzy-rough sets; fuzzy-rough simultaneous attribute selection; machine learning; nearest neighbor rule; neighborhood rough sets; pattern recognition; support vector machine; Approximation methods; Complexity theory; Data mining; Feature extraction; Rough sets; Silicon; Uncertainty; Attribute selection; classification; feature extraction; pattern recognition; rough sets;
Journal_Title :
Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMCB.2012.2225832