Title :
Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches
Author :
Jensen, Richard ; Shen, Qiang
Author_Institution :
Sch. of Informatics, Edinburgh Univ., UK
Abstract :
Semantics-preserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application in tasks that involve data sets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and Web content classification. One of the many successful applications of rough set theory has been to this feature selection area. This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies. Several approaches to feature selection based on rough set theory are experimentally compared. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough set-based feature grouping technique is detailed.
Keywords :
feature extraction; fuzzy set theory; learning (artificial intelligence); rough set theory; data semantics; dimensionality reduction; feature selection; feature transformation; fuzzy-rough selection; machine learning; pattern recognition; rough selection; rough set theory; signal processing; Decision making; Fuzzy sets; Machine learning; Pattern recognition; Rough sets; Set theory; Signal processing; Text processing; Uncertainty; 65; Index Terms- Dimensionality reduction; feature selection; feature transformation; fuzzy-rough selection.; rough selection;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2004.96