DocumentCode
2805790
Title
On Clustering Validity Measures and the Rough Set Theory
Author
Arco, Leticia ; Bello, Rafael ; Garcia, Maria M.
Author_Institution
Central University of Las Villas, Cuba
fYear
2006
fDate
Nov. 2006
Firstpage
168
Lastpage
177
Abstract
Document clustering has been investigated for use in different areas of text mining and information retrieval. A clustering depends on the chosen clustering algorithm as well as on the algorithm¿s parameter settings; for that reason it is necessary to find the best among several clustering techniques. However, it is very difficult to evaluate a given clustering of documents. There are external, internal and relative measures. The disadvantage of external measures is the necessity of a human reference classification to evaluate the clustering. In this paper we propose the use of rough-set-based measures for document clustering evaluation, basing our calculations solely on the clustering that has to be evaluated. Thus, two advantages of rough set theory are used: it does not need any preliminary or additional information about data, and it is a tool for use in computer applications in circumstances which are characterized by vagueness and uncertainty (this is the case of document clustering). We illustrate the use of the novel measures.
Keywords
Area measurement; Clustering algorithms; Computer applications; Computer science; Density measurement; Humans; Information retrieval; Machine learning; Set theory; Text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Artificial Intelligence, 2006. MICAI '06. Fifth Mexican International Conference on
Conference_Location
Mexico City, Mexico
Print_ISBN
0-7695-2722-1
Type
conf
DOI
10.1109/MICAI.2006.36
Filename
4022150
Link To Document