• DocumentCode
    3316856
  • Title

    Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation

  • Author

    Tian, David ; Keane, John ; Zeng, Xiao-Jun

  • Author_Institution
    Manchester Univ., Manchester
  • fYear
    2007
  • fDate
    23-26 July 2007
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are termed reducts. The intersection of all reducts is termed the core. As RSFS works on discrete attributes only, for real-valued datasets discretization of the real attributes is performed before RSFS. The core size of the discretized datasets is determined by the discretization process. Previous work has shown that the core size of the discretized dataset critically affects the performance of RSFS. This paper proposes a type of discretization termed core-generating approximate minimum entropy discretization (C-GAME) which selects a set of minimum entropy cuts capable of generating discrete data with nonempty cores. The paper defines C-GAME and then models it as a constraint satisfaction optimization problem which is solved using the branch and bound algorithm. Experiments have been performed on 2 datasets from the UCI database to investigate the performance of C-GAME as a pre-processing step for RSFS. Results show that, for these datasets, C-GAME outperforms both the recursive minimal entropy partition discretization method (RMEP) and the original decision trees without feature selection.
  • Keywords
    constraint theory; minimisation; pattern classification; rough set theory; tree searching; C-GAME; branch-and-bound algorithm; classifier performance; constraint satisfaction optimization problem; core-generating approximate minimum entropy discretization; discretized dataset; rough set feature selection; Classification tree analysis; Constraint optimization; Decision trees; Discrete transforms; Entropy; Genetics; Partitioning algorithms; Rough sets; Set theory; Spatial databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems Conference, 2007. FUZZ-IEEE 2007. IEEE International
  • Conference_Location
    London
  • ISSN
    1098-7584
  • Print_ISBN
    1-4244-1209-9
  • Electronic_ISBN
    1098-7584
  • Type

    conf

  • DOI
    10.1109/FUZZY.2007.4295437
  • Filename
    4295437