DocumentCode :
2061430
Title :
Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval
Author :
Subhashini, R. ; Kumar, V.J.S.
Author_Institution :
Sathyabama Univ., Chennai, India
fYear :
2010
fDate :
5-7 Aug. 2010
Firstpage :
27
Lastpage :
31
Abstract :
This paper presents the results of an experimental study of some similarity measures used for both Information Retrieval and Document Clustering. Our results indicate that the cosine similarity measure is superior than the other measures such as Jaccard measure, Euclidean measure that we tested. Cosine Similarity measure is particularly better for text documents. Previously these measures are compared with the conventional text datasets but the proposed system collects the datasets with the help of API and it returns the collection of XML pages. These XML pages are parsed and filtered to get the web document datasets. In this paper, we compare and analyze the effectiveness of these measures for these web document datasets.
Keywords :
Internet; XML; application program interfaces; information retrieval; pattern clustering; text analysis; API; Web document datasets; XML page filtering; XML page parsing; cosine similarity measure; document clustering; information retrieval; performance evaluation; similarity measures; text documents; Clustering algorithms; Context; Euclidean distance; Information retrieval; Internet; XML; Document clustering; Information Retrieval; Web mining; similarity measure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Integrated Intelligent Computing (ICIIC), 2010 First International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4244-7963-4
Electronic_ISBN :
978-0-7695-4152-5
Type :
conf
DOI :
10.1109/ICIIC.2010.42
Filename :
5571521
Link To Document :
بازگشت