DocumentCode
2427815
Title
An Improved Shark-Search Algorithm Based on Multi-information
Author
Chen, Zhumin ; Ma, Jun ; Lei, JingSheng ; Yuan, Bo ; Lian, Li
Author_Institution
Shandong Univ., Jinan
Volume
4
fYear
2007
fDate
24-27 Aug. 2007
Firstpage
659
Lastpage
658
Abstract
With the enormous growth of world wide web, existing general-purpose search engines have presented much more limitations. Focused crawling is increasingly seen as a potential solution. The key of focused crawling is how to accurately predict the relevance of the unvisited web pages pointed to by known URLs to a given topic. A formalized description of the predicting process is introduced. Then, four policies are proposed to predict the relevance of unvisited pages to a topic. Further the combinations of these policies are used to improve the Shark-Search, which is a classic focused crawling algorithm mainly based on the textual information of Web pages. A large number of experiments were carried out to identify the optimized combination and verify that the improved Shark-Search is more effective than the original one.
Keywords
Internet; search engines; Web pages; World Wide Web; focused crawling; general-purpose search engines; improved shark-search algorithm; multiinformation; textual information; Computer science; Crawlers; Educational institutions; Heuristic algorithms; Information science; Marine animals; Search engines; Uniform resource locators; Web pages; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location
Haikou
Print_ISBN
978-0-7695-2874-8
Type
conf
DOI
10.1109/FSKD.2007.166
Filename
4406469
Link To Document