• DocumentCode
    2805790
  • Title

    On Clustering Validity Measures and the Rough Set Theory

  • Author

    Arco, Leticia ; Bello, Rafael ; Garcia, Maria M.

  • Author_Institution
    Central University of Las Villas, Cuba
  • fYear
    2006
  • fDate
    Nov. 2006
  • Firstpage
    168
  • Lastpage
    177
  • Abstract
    Document clustering has been investigated for use in different areas of text mining and information retrieval. A clustering depends on the chosen clustering algorithm as well as on the algorithm¿s parameter settings; for that reason it is necessary to find the best among several clustering techniques. However, it is very difficult to evaluate a given clustering of documents. There are external, internal and relative measures. The disadvantage of external measures is the necessity of a human reference classification to evaluate the clustering. In this paper we propose the use of rough-set-based measures for document clustering evaluation, basing our calculations solely on the clustering that has to be evaluated. Thus, two advantages of rough set theory are used: it does not need any preliminary or additional information about data, and it is a tool for use in computer applications in circumstances which are characterized by vagueness and uncertainty (this is the case of document clustering). We illustrate the use of the novel measures.
  • Keywords
    Area measurement; Clustering algorithms; Computer applications; Computer science; Density measurement; Humans; Information retrieval; Machine learning; Set theory; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Artificial Intelligence, 2006. MICAI '06. Fifth Mexican International Conference on
  • Conference_Location
    Mexico City, Mexico
  • Print_ISBN
    0-7695-2722-1
  • Type

    conf

  • DOI
    10.1109/MICAI.2006.36
  • Filename
    4022150