• DocumentCode
    2400570
  • Title

    Efficient ensemble algorithm for mixed numeric and categorical data

  • Author

    Reddy, M. V Jagannatha ; Kavitha, B.

  • Author_Institution
    Dept. of CSE, Madanapalle Inst. of Technol. & Sci., Chittoor, India
  • fYear
    2010
  • fDate
    28-29 Dec. 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Most previous clustering algorithms focus on numerical data whose inherent geometric properties can be exploited naturally to define distance functions between data points. However, much of the data existed in the databases is categorical, where attribute values cannot be naturally ordered as numerical values. Due to the differences in the characteristics of these two kinds of data, attempts to develop criteria functions for mixed data have been not very successful. In this research, we propose a novel divide-and-conquer technique to solve this problem. First, the original mixed dataset is divided into two sub-datasets: the pure categorical dataset and the pure numeric dataset. Next, existing well established clustering algorithms designed for different types of datasets are employed to produce corresponding clusters. Last, the clustering results on the categorical and numeric dataset are combined as a categorical dataset, on which the categorical data clustering algorithm is employed to get the final output. Our main contribution in this research is to provide an algorithm framework for the mixed attributes clustering problem, in which existing clustering algorithms can be easily integrated.
  • Keywords
    divide and conquer methods; pattern clustering; categorical data; categorical data clustering algorithm; clustering algorithms; divide-and-conquer technique; ensemble algorithm; mixed numeric data; pure categorical dataset; pure numeric dataset; Algorithm design and analysis; Clustering algorithms; Complexity theory; Data mining; Indexes; Machine learning algorithms; Partitioning algorithms; Clustering algorithms; categorical dataset; divide-and-conquer; numerical dataset;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on
  • Conference_Location
    Coimbatore
  • Print_ISBN
    978-1-4244-5965-0
  • Electronic_ISBN
    978-1-4244-5967-4
  • Type

    conf

  • DOI
    10.1109/ICCIC.2010.5705738
  • Filename
    5705738