DocumentCode
2160459
Title
Efficient focused crawling based on best first search
Author
Rawat, Seema ; Patil, D.R.
Author_Institution
Dept. of Comput. Eng., RCPIT, Dhule, India
fYear
2013
fDate
22-23 Feb. 2013
Firstpage
908
Lastpage
911
Abstract
The World Wide Web continues to grow at an exponential rate, so fetching information about a special-topic is gaining importance which poses exceptional scaling challenges for general-purpose crawlers and search engines. This paper describes a web crawling approach based on best first search. As the goal of a focused crawler is to selectively seek out pages that are relevant to given keywords. Rather than collecting and indexing all available web documents to be able to answer all possible queries, a focused crawler analyze its crawl boundary to hit upon the links that are likely to be most relevant for the crawl, and avoids irrelevant links of the document. This leads to significant savings in hardware as well as network resources and also helps keep the crawl more up-to-date. To accomplish such goal-directed crawling, we select top most k relevant documents for a given query and then expand the most promising link chosen according to link score, to circumvent irrelevant regions of the web.
Keywords
Internet; document handling; query processing; search engines; Web crawling approach; Web documents; World Wide Web; best first search; crawl boundary; exceptional scaling challenge; focused crawling; general-purpose crawlers; goal-directed crawling; keywords; link score; query; search engines; Computers; Conferences; Crawlers; Frequency conversion; Search engines; Uniform resource locators; Web pages; Focused web crawler; Query specific search; Relevancy calculation; TF-IDF;
fLanguage
English
Publisher
ieee
Conference_Titel
Advance Computing Conference (IACC), 2013 IEEE 3rd International
Conference_Location
Ghaziabad
Print_ISBN
978-1-4673-4527-9
Type
conf
DOI
10.1109/IAdCC.2013.6514347
Filename
6514347
Link To Document