• DocumentCode
    3004064
  • Title

    An evolution strategies approach to the simultaneous discretization of numeric attributes in data mining

  • Author

    Valdés, Julio J. ; Molina, Luis Carlos ; Peris, Natán

  • Author_Institution
    Inst. for Inf. Technol., Nat. Res. Council of Canada, Ottawa, Ont., Canada
  • Volume
    3
  • fYear
    2003
  • fDate
    8-12 Dec. 2003
  • Firstpage
    1957
  • Abstract
    Many data mining and machine learning algorithms require databases in which objects are described by discrete attributes. However, it is very common that the attributes are in the ratio or interval scales. In order to apply these algorithms, the original attributes must be transformed into the nominal or ordinal scale via discretization. An appropriate transformation is crucial because of the large influence on the results obtained from data mining procedures. This paper presents a hybrid technique for the simultaneous supervised discretization of continuous attributes, based on evolutionary algorithms, in particular, evolution strategies (ES), which is combined with rough set theory and information theory. The purpose is to construct a discretization scheme for all continuous attributes simultaneously (i.e. global) in such a way that class predictability is maximized w.r.t the discrete classes generated for the predictor variables. The ES approach is applied to 17 public data sets and the results are compared with classical discretization methods. ES-based discretization not only outperforms these methods, but leads to much simpler data models and is able to discover irrelevant attributes. These features are not present in classical discretization techniques.
  • Keywords
    data mining; evolutionary computation; information theory; learning (artificial intelligence); rough set theory; data mining; discretization technique; evolution strategies; evolutionary algorithm; information theory; machine learning; rough set theory; Cities and towns; Data mining; Data models; Databases; Information technology; Machine learning; Machine learning algorithms; Mexico Council; Petroleum; Set theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation, 2003. CEC '03. The 2003 Congress on
  • Print_ISBN
    0-7803-7804-0
  • Type

    conf

  • DOI
    10.1109/CEC.2003.1299913
  • Filename
    1299913