Title of article :
Evaluation of Text Clustering Methods Using WordNet
Author/Authors :
Abdelmalek Amine، نويسنده , , Zakaria Elberrichi، نويسنده , , Michel Simonet، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Pages :
9
From page :
349
To page :
357
Abstract :
The increasing number of digitized texts presently available notably on the Web has developed an acute need in textmining techniques. Clustering systems are used more and more often in text mining, especially to analyze texts and to extractknowledge they contain. With the availability of the vast amount of clustering algorithms and techniques, it becomes highlyconfusing to a user to choose the algorithm that best suits its target dataset. Actually, it is very hard to define which algorithmswork the best, since results depend considerably on the application and on the kinds of data at hand. In this paper, we propose, study and compare three text clustering methods: an ascending hierarchical clustering method, a SOM-based clusteringmethod and an ant-based clustering method, all of these based on the synsets of WordNet as terms for the representation oftextual documents. The effects of these methods are examined in several experiments using 3 similarity measurements: thecosine distance, the Euclidean distance and the manhattan distance. The reuters-21578 corpus is used for evaluation. Theevaluation was done, by using the F-measure. The results obtained show that the SOM-based clustering method using thecosine distance provides the best results
Keywords :
Text clustering , reuter-21578 , WORDNET , and F-measure , Similarity
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Serial Year :
2010
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Record number :
668812
Link To Document :
بازگشت