DocumentCode :
2678331
Title :
K-Means for Search Results Clustering Using URL and Tag Contents
Author :
Poomagal, S. ; Hamsapriya, T.
Author_Institution :
Dept. of Comput. & Inf. Sci., PSG Coll. of Technol., Coimbatore, India
fYear :
2011
fDate :
20-22 July 2011
Firstpage :
1
Lastpage :
7
Abstract :
Increasing volume of web has resulted in the flooding of huge collection of web documents in search results creating difficulty for the user to browse the necessary document. Clustering is a solution to organize search results in a better way for browsing. It is a process of combining similar web documents into groups. For web page clustering, terms (features) can be extracted from different parts of a web page. Giansalvatore, Salvatore and Alessandro have extracted terms from entire web page for clustering Stanis law Osinski et al., have considered terms only from snippets. A new method is introduced in this paper which extract terms from URL, Title tag and Meta tag to produce clusters of web documents. The reason for selecting these parts of a web page is that they contain keywords which are available in a web page. Clustering algorithm used in this paper is K-means. Proposed method of clustering is compared with snippet based clustering in terms of intra-cluster distance and inter-cluster distance.
Keywords :
Web sites; document handling; feature extraction; information retrieval; pattern clustering; search problems; URL; Web documents; Web page; feature extraction; k-means clustering; meta tag; search result clustering; snippet based clustering; tag content; title tag; Clustering algorithms; Ear; Feature extraction; Frequency measurement; Partitioning algorithms; Search engines; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Process Automation, Control and Computing (PACC), 2011 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-61284-765-8
Type :
conf
DOI :
10.1109/PACC.2011.5978906
Filename :
5978906
Link To Document :
بازگشت