Title :
Metrics for information retrieval: A case study
Author :
Ravishankar, T. Nadana ; Shriram, R.
Author_Institution :
Dept. of CSE, B.S.Abdur Rahman Univ., Chennai, India
Abstract :
The domain of information retrieval (IR)has used clustering methods in a big way. Clustering is a technique that groups a set of documents into clusters or subsets. How efficiently and effectively the relevant documents are extracted from World Wide Web is a challenging issue. In this work, we compare and analyse the effectiveness of similarity measures such as City Block distance, Cosine similarity, Point symmetry distance and Dicecoefficient to improve document clustering with and without the presence of ontology. This has two objectives: a comparison of metrics in the domain and study the impact of various methods like ontology comparison and clustering on the metrics as a whole. This will lead to further refinement of the metrics for current and future needs in the domain. Earlier works in the domain have highlighted the fact that the results of the similarity measures are more or less the same. However our work shows that the use of ontology based clustering marked changes in the results. The results show the need for more work to be focused on the metrics aspect in information retrieval.
Keywords :
document handling; information retrieval; ontologies (artificial intelligence); pattern clustering; Dice coefficient measure; IR; World Wide Web; city block distance measure; clustering method; cosine similarity measure; document extraction; information retrieval metric; ontology clustering; ontology comparison; point symmetry distance measure; K-means; Text clustering; ontology; similarity measures;
Conference_Titel :
Software Engineering and Mobile Application Modelling and Development (ICSEMA 2012), International Conference on
Conference_Location :
Chennai
Electronic_ISBN :
978-1-84919-736-6
DOI :
10.1049/ic.2012.0146