• DocumentCode
    2677736
  • Title

    ROCK: a robust clustering algorithm for categorical attributes

  • Author

    Guha, Saikat ; Rastogi, Rajeev ; Shim, Kyuseok

  • Author_Institution
    Stanford Univ., CA, USA
  • fYear
    1999
  • fDate
    23-26 Mar 1999
  • Firstpage
    512
  • Lastpage
    521
  • Abstract
    We study clustering algorithms for data with Boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for Boolean and categorical attributes. Instead, we propose a novel concept of links to measure the similarity/proximity between a pair of data points. We develop a robust hierarchical clustering algorithm, ROCK, that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study with real-life as well as synthetic data sets. Our study shows that ROCK not only generates better quality clusters than traditional algorithms, but also exhibits good scalability properties
  • Keywords
    category theory; computational complexity; data handling; database management systems; pattern clustering; Boolean attributes; ROCK; categorical attributes; complexity results; data points; domain expert; non-metric similarity measures; robust clustering algorithm; robust hierarchical clustering algorithm; scalability properties; similarity table; similarity/proximity; synthetic data sets; Character generation; Clustering algorithms; Dairy products; Data mining; Merging; Partitioning algorithms; Pediatrics; Robustness; Transaction databases; Uniform resource locators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 1999. Proceedings., 15th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-0071-4
  • Type

    conf

  • DOI
    10.1109/ICDE.1999.754967
  • Filename
    754967