DocumentCode
3316856
Title
Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation
Author
Tian, David ; Keane, John ; Zeng, Xiao-Jun
Author_Institution
Manchester Univ., Manchester
fYear
2007
fDate
23-26 July 2007
Firstpage
1
Lastpage
6
Abstract
Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are termed reducts. The intersection of all reducts is termed the core. As RSFS works on discrete attributes only, for real-valued datasets discretization of the real attributes is performed before RSFS. The core size of the discretized datasets is determined by the discretization process. Previous work has shown that the core size of the discretized dataset critically affects the performance of RSFS. This paper proposes a type of discretization termed core-generating approximate minimum entropy discretization (C-GAME) which selects a set of minimum entropy cuts capable of generating discrete data with nonempty cores. The paper defines C-GAME and then models it as a constraint satisfaction optimization problem which is solved using the branch and bound algorithm. Experiments have been performed on 2 datasets from the UCI database to investigate the performance of C-GAME as a pre-processing step for RSFS. Results show that, for these datasets, C-GAME outperforms both the recursive minimal entropy partition discretization method (RMEP) and the original decision trees without feature selection.
Keywords
constraint theory; minimisation; pattern classification; rough set theory; tree searching; C-GAME; branch-and-bound algorithm; classifier performance; constraint satisfaction optimization problem; core-generating approximate minimum entropy discretization; discretized dataset; rough set feature selection; Classification tree analysis; Constraint optimization; Decision trees; Discrete transforms; Entropy; Genetics; Partitioning algorithms; Rough sets; Set theory; Spatial databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems Conference, 2007. FUZZ-IEEE 2007. IEEE International
Conference_Location
London
ISSN
1098-7584
Print_ISBN
1-4244-1209-9
Electronic_ISBN
1098-7584
Type
conf
DOI
10.1109/FUZZY.2007.4295437
Filename
4295437
Link To Document