DocumentCode :
2805790
Title :
On Clustering Validity Measures and the Rough Set Theory
Author :
Arco, Leticia ; Bello, Rafael ; Garcia, Maria M.
Author_Institution :
Central University of Las Villas, Cuba
fYear :
2006
fDate :
Nov. 2006
Firstpage :
168
Lastpage :
177
Abstract :
Document clustering has been investigated for use in different areas of text mining and information retrieval. A clustering depends on the chosen clustering algorithm as well as on the algorithm¿s parameter settings; for that reason it is necessary to find the best among several clustering techniques. However, it is very difficult to evaluate a given clustering of documents. There are external, internal and relative measures. The disadvantage of external measures is the necessity of a human reference classification to evaluate the clustering. In this paper we propose the use of rough-set-based measures for document clustering evaluation, basing our calculations solely on the clustering that has to be evaluated. Thus, two advantages of rough set theory are used: it does not need any preliminary or additional information about data, and it is a tool for use in computer applications in circumstances which are characterized by vagueness and uncertainty (this is the case of document clustering). We illustrate the use of the novel measures.
Keywords :
Area measurement; Clustering algorithms; Computer applications; Computer science; Density measurement; Humans; Information retrieval; Machine learning; Set theory; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Artificial Intelligence, 2006. MICAI '06. Fifth Mexican International Conference on
Conference_Location :
Mexico City, Mexico
Print_ISBN :
0-7695-2722-1
Type :
conf
DOI :
10.1109/MICAI.2006.36
Filename :
4022150
Link To Document :
بازگشت