Title :
Clustering of web search results using Suffix tree algorithm and avoidance of repetition of same images in search results using L-Point Comparison algorithm
Author :
Suneetha, Manne ; Fatima, S. Sameen ; Pervez, Shaik Mohd Zaheer
Author_Institution :
Dept. of Inf. Technol., Velagapudi Ramakrishna Siddhartha Eng. Coll., Vijayawada, India
Abstract :
It is a common experience to the web users with the existing search engines like Google, Yahoo, MSN, Ask, e.t.c., that the information related to the entered query returns a long ranked list of results (snippets). It becomes cumbersome to the user to go through each title, snippet and even sometimes link of the search results until relevant results are found to the query. Clustering of search results is a special technique in data mining using which the retrieved results are organized into meaningful groups enlightening the user work. This paper deals with the generalized Suffix tree based clustering approach. The most repeated phrase in the document tags is considered as cluster name. Thus in short, web search results that are fetched from the prevailing web search engines grouped under phrases that contain one or more search keywords. This paper aims at organizing web search results into clusters facilitating quick browsing options to the browser providing an excellent interface to results precisely. Suffix tree clustering produces comparatively more accurate and informative grouped results. A basic problem during image searching in any search engine is Image Repetition. This can be avoided by using the L-Point Comparison algorithm, a specially worked out technique in field of Information Retrieval systems, is also discussed with a practical example.
Keywords :
Internet; content-based retrieval; data mining; image retrieval; pattern clustering; search engines; tree data structures; trees (mathematics); Ask; Google; L-point comparison algorithm; MSN; Web search result clustering; Yahoo; cluster name; data mining; document tags; generalized suffix tree based clustering approach; image repetition avoidance; image searching; information retrieval system; query return; quick browsing option; search engines; suffix tree algorithm; Clustering algorithms; Data mining; Engines; Pixel; Search engines; Shape; Web search; Cleaning of Document; Coherent clustering; L-point image Comparison (LPC); Shared phrase; Suffix Tree Based Clustering (STBC);
Conference_Titel :
Emerging Trends in Electrical and Computer Technology (ICETECT), 2011 International Conference on
Conference_Location :
Tamil Nadu
Print_ISBN :
978-1-4244-7923-8
DOI :
10.1109/ICETECT.2011.5760272