• DocumentCode
    1631997
  • Title

    A new data clustering approach for data mining in large databases

  • Author

    Tsai, Cheng-Fa ; Wu, Han-Chang ; Tsai, Chun-Wei

  • Author_Institution
    Dept. of Manage. Inf. Syst., Nat. Pingtung Univ. of Sci. & Technol., Taiwan
  • fYear
    2002
  • fDate
    6/24/1905 12:00:00 AM
  • Firstpage
    278
  • Lastpage
    283
  • Abstract
    Clustering is the unsupervised classification of patterns (data item, feature vectors, or observations) into groups (clusters). Clustering in data mining is very useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric-based similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we present a new data clustering method for data mining in large databases. Our simulation results show that the proposed novel clustering method performs better than a fast self-organizing map (FSOM) combined with the k-means approach (FSOM+k-means) and the genetic k-means algorithm (GKA). In addition, in all the cases we studied, our method produces much smaller errors than both the FSOM+k-means approach and GKA
  • Keywords
    data mining; genetic algorithms; pattern clustering; self-organising feature maps; very large databases; FSOM+k-means approach; ant system; data clustering method; data distribution pattern discovery; data item; data mining; database partitioning; distance metric-based similarity measure; errors; fast self-organizing map; feature vectors; genetic k-means algorithm; large databases; observations; similar data points; simulation; unsupervised pattern classification; Clustering algorithms; Clustering methods; Data mining; Extraterrestrial measurements; Feedback; Iterative algorithms; Partitioning algorithms; Prototypes; Shape measurement; Spatial databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures, Algorithms and Networks, 2002. I-SPAN '02. Proceedings. International Symposium on
  • Conference_Location
    Makati City, Metro Manila
  • ISSN
    1087-4089
  • Print_ISBN
    0-7695-1579-7
  • Type

    conf

  • DOI
    10.1109/ISPAN.2002.1004300
  • Filename
    1004300