• DocumentCode
    3154634
  • Title

    Effective supervised discretization for classification based on correlation maximization

  • Author

    Zhu, Qiusha ; Lin, Lin ; Shyu, Mei-Ling ; Chen, Shu-Ching

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Miami, Coral Gables, FL, USA
  • fYear
    2011
  • fDate
    3-5 Aug. 2011
  • Firstpage
    390
  • Lastpage
    395
  • Abstract
    In many real-world applications, there are features (or attributes) that are continuous or numerical in the data. However, many classification models only take nominal features as the inputs. Therefore, it is necessary to apply discretization as a pre-processing step to transform numerical data into nominal data for such models. Well-discretized data should not only characterize the original data to produce a concise summarization, but also improve the classification performance. In this paper, a novel and effective supervised discretization algorithm based on correlation maximization (CM) is proposed by using multiple correspondence analysis (MCA) which is a technique to capture the correlations between multiple variables. For each numeric feature, the correlation information generated from MCA is used to build the discretization algorithm that maximizes the correlations between feature intervals/items and classes. Empirical comparisons with four other commonly used discretization algorithms are conducted using six well-known classifiers. Results on five UCI datasets and five TRECVID datasets demonstrate that our proposed discretization algorithm can automatically generate a better set of features (feature intervals) by maximizing their correlations with the classes and thus improve the classification performance.
  • Keywords
    correlation methods; optimisation; pattern classification; TRECVID datasets; UCI datasets; classification performance; concise summarization; correlation maximization; multiple correspondence analysis; nominal data; numerical data; supervised discretization algorithm; Algorithm design and analysis; Correlation; Data mining; Entropy; Matrix decomposition; Partitioning algorithms; Symmetric matrices; Classification; Correlation; Multiple Correspondence Analysis (MCA); Supervised Discretization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2011 IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4577-0964-7
  • Electronic_ISBN
    978-1-4577-0965-4
  • Type

    conf

  • DOI
    10.1109/IRI.2011.6009579
  • Filename
    6009579