• DocumentCode
    787109
  • Title

    Unsupervised learning with mixed numeric and nominal data

  • Author

    Li, Cen ; Biswas, Gautam

  • Author_Institution
    Dept. of Comput. Sci., Middle Tennessee State Univ., Murfreesboro, TN, USA
  • Volume
    14
  • Issue
    4
  • fYear
    2002
  • Firstpage
    673
  • Lastpage
    690
  • Abstract
    Presents a similarity-based agglomerative clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure proposed by D.W. Goodall (1966) for biological taxonomy, that gives greater weight to uncommon feature value matches in similarity computations and makes no assumptions about the underlying distributions of the feature values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a dendrogram, and a simple distinctness heuristic is used to extract a partition of the data. The performance of the SBAC algorithm has been studied on real and artificially-generated data sets. The results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other clustering schemes illustrate the superior performance of this approach
  • Keywords
    data analysis; data mining; feature extraction; pattern clustering; pattern matching; software performance evaluation; statistical analysis; tree data structures; unsupervised learning; χ2 aggregation; algorithm performance; conceptual clustering; data partition extraction; dendrogram; distinctness heuristic; feature weighting; interpretation; knowledge discovery; mixed numeric/nominal data; performance; similarity computations; similarity measure; similarity-based agglomerative clustering algorithm; uncommon feature value matches; underlying feature value distributions; unsupervised discovery tasks; unsupervised learning; Clustering algorithms; Computer aided manufacturing; Data analysis; Data mining; Engines; Partitioning algorithms; Problem-solving; Shape; Taxonomy; Unsupervised learning;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2002.1019208
  • Filename
    1019208