• DocumentCode
    2536168
  • Title

    Clustering Relational Database Entities Using K-means

  • Author

    Bourennani, Farid ; Guennoun, Mouhcine ; Zhu, Ying

  • Author_Institution
    Inst. of Technol., Univ. of Ontario, Oshawa, ON, Canada
  • fYear
    2010
  • fDate
    11-16 April 2010
  • Firstpage
    143
  • Lastpage
    148
  • Abstract
    The fast evolution of hardware and the internet made large volumes of data more accessible. This data is composed of heterogeneous data types such as text, numbers, multimedia, and others. Non-overlapping research communities work on processing homogeneous data types. Nevertheless, from the user perspective, these heterogeneous data types should behave and be accessed in a similar fashion. Processing heterogeneous data types, which is Heterogeneous Data Mining (HDM), is a complex task. However, the HDM by Unified Vectorization (HDM-UV) seems to be an appropriate solution for this problem because it permits to process the heterogeneous data types simultaneously. In this paper, we use K-means and Self-Organizing Maps for simultaneously processing textual and numerical data types by UV. We evaluate how the HDM-UV improves the clustering results of these two algorithms (SOM, K-means) by comparing them to the traditional homogeneous data processing. Furthermore, we compare the clustering results of the two algorithms applied to a data integration problem.
  • Keywords
    data mining; pattern clustering; relational databases; self-organising feature maps; K-means clustering; data integration; heterogeneous data mining; relational database entities; self-organizing maps; unified vectorization; Biomedical measurements; Business; Clustering algorithms; Clustering methods; Companies; Data mining; Hardware; Mining industry; Relational databases; Weight measurement; Data Integration; Heterogeneous data mining; K-means; Pre-Processing; SOM;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Databases Knowledge and Data Applications (DBKDA), 2010 Second International Conference on
  • Conference_Location
    Menuires
  • Print_ISBN
    978-1-4244-6081-6
  • Type

    conf

  • DOI
    10.1109/DBKDA.2010.32
  • Filename
    5477134