• DocumentCode
    243775
  • Title

    Discretizing Numerical Attributes in Decision Tree for Big Data Analysis

  • Author

    Yiqun Zhang ; Yiu-ming Cheung

  • Author_Institution
    Dept. of Comput. Sci., Hong Kong Baptist Univ., Hong Kong, China
  • fYear
    2014
  • fDate
    14-14 Dec. 2014
  • Firstpage
    1150
  • Lastpage
    1157
  • Abstract
    The decision tree induction learning is a typical machine learning approach which has been extensively applied for data mining and knowledge discovery. For numerical data and mixed data, discretization is an essential pre-processing step of decision tree learning. However, when coping with big data, most of the existing discretization approaches will not be quite efficient from the practical viewpoint. Accordingly, we propose a new discretization method based on windowing and hierarchical clustering to improve the performance of conventional decision tree for big data analysis. The proposed method not only provides a faster process of discretizing numerical attributes with the competent classification accuracy, but also reduces the size of the decision tree. Experiments show the efficacy of the proposed method on the real data sets.
  • Keywords
    Big Data; data mining; decision trees; learning (artificial intelligence); pattern clustering; big data analysis; data mining; decision tree; discretization method; hierarchical clustering; induction learning; knowledge discovery; machine learning; numerical attribute; windowing method; Big data; Data mining; Decision trees; Market research; Noise; Noise measurement; Big Data; Discretization; Hierarchical Clustering; Noise; Numerical Attribute; Window;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-4799-4275-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2014.103
  • Filename
    7022725