Title :
Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval
Author :
Subhashini, R. ; Kumar, V.J.S.
Author_Institution :
Sathyabama Univ., Chennai, India
Abstract :
This paper presents the results of an experimental study of some similarity measures used for both Information Retrieval and Document Clustering. Our results indicate that the cosine similarity measure is superior than the other measures such as Jaccard measure, Euclidean measure that we tested. Cosine Similarity measure is particularly better for text documents. Previously these measures are compared with the conventional text datasets but the proposed system collects the datasets with the help of API and it returns the collection of XML pages. These XML pages are parsed and filtered to get the web document datasets. In this paper, we compare and analyze the effectiveness of these measures for these web document datasets.
Keywords :
Internet; XML; application program interfaces; information retrieval; pattern clustering; text analysis; API; Web document datasets; XML page filtering; XML page parsing; cosine similarity measure; document clustering; information retrieval; performance evaluation; similarity measures; text documents; Clustering algorithms; Context; Euclidean distance; Information retrieval; Internet; XML; Document clustering; Information Retrieval; Web mining; similarity measure;
Conference_Titel :
Integrated Intelligent Computing (ICIIC), 2010 First International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4244-7963-4
Electronic_ISBN :
978-0-7695-4152-5
DOI :
10.1109/ICIIC.2010.42