Title :
An Architectural Framework of a Crawler for Retrieving Highly Relevant Web Documents by Filtering Replicated Web Collections
Author :
Shekhar, Shashi ; Agrawal, Rohit ; Arya, Karm Veer
Author_Institution :
GLA Inst. of Technol. & Manage., Mathura, India
Abstract :
As the Web continues to grow, it has become a difficult task to search for the relevant information using traditional search engines. There are many index based web search engines to search information in various domains on the Web. By using such search engines the retrieved documents (URLs) related to the searched topic are of poor quality also as the amount of Web pages is growing at a rapid speed, the issue of devising a personalized Web search is of great importance. This paper proposes a method to reduce the time spend on browsing search results by providing a personalized Web Search Agent (MetaCrawler). In the proposed technique of personalized Web searching, Web pages relevant to user interests will be ranked in the front of the result list, thus facilitating the user to get a quick to get access those links ranked in the front of the list. An experiment was designed and conducted to test the performance of proposed Web-Filtering approach. The experimental results suggest substantial improvement in the crawling strategy, especially when the search strings are small.
Keywords :
Computer networks; Crawlers; Data mining; Information filtering; Information filters; Intelligent agent; Search engines; Uniform resource locators; Web pages; Web search; Link analysis; Search result ranking; Web IR; Web crawler; Web page classification;
Conference_Titel :
Advances in Computer Engineering (ACE), 2010 International Conference on
Conference_Location :
Bangalore, Karnataka, India
Print_ISBN :
978-1-4244-7154-6
DOI :
10.1109/ACE.2010.64