DocumentCode :
1666167
Title :
Research on a dynamic adjust crawling algorithm for guiding the topic crawler through Tunnels
Author :
Xu, Chang ; Jian-guo, Xu ; Bin, Jia
Author_Institution :
College of Information and Engineering Shan Dong University of Science and Technology Qingdao, China
fYear :
2011
Firstpage :
1
Lastpage :
4
Abstract :
The problem of Tunnels is always the focus of topic crawler. Based on the study of VSM, the paper added the impact of the text structure of web documents to the topic similarity, improved VSM text classification algorithm to make the prediction more accurate, and applied to the dynamic adjustment topic crawler algorithm through the tunnel. By analyzing the influence by features of Web Community and tunneling, taking the genetic factors of parent page and child pages into account, applied to the web page similarity calculation. In order to improve the shortcomings of the traditional tunnel method, this paper designed a new algorithm to make crawler dynamically adjust the K values according to the corresponding calculated strategy during crawling the pages, Making Web Community and tunnels to form a relatively complete thematic clusters to improve the web crawl rate.
Keywords :
Classification algorithms; Communities; Crawlers; Educational institutions; Heuristic algorithms; Prediction algorithms; Text categorization; Topic crawler; Topic similarity; Turnning; VSM; Web Community;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
E -Business and E -Government (ICEE), 2011 International Conference on
Conference_Location :
Shanghai, China
Print_ISBN :
978-1-4244-8691-5
Type :
conf
DOI :
10.1109/ICEBEG.2011.5884527
Filename :
5884527
Link To Document :
بازگشت