DocumentCode :
3598445
Title :
Research of theme crawling strategy based on genetic algorithm
Author :
Yifeng, Chen ; Hengkai, Zhao ; Xiaoqing, Yu ; Wanggen, Wan
Author_Institution :
School of Communication And Information Engineering, Shanghai University, Shanghai, 200072
fYear :
2009
Firstpage :
472
Lastpage :
475
Abstract :
Aiming at the subject drifting problem of topic crawling, this paper presents a theme crawling strategy of web crawler. Based on Genetic Algorithm, this strategy absorbs PageRank algorithm and correlation of web page and theme, re-sets the fitness function and adjusts size of correlative parameters of calculation. In this way, superior gene individual is selected firstly and subject drifting problem is reduced. Compared with previous strategies based on genetic algorithm, the number of web pages relevant to the crawling subject can be raised more than 5%.
Keywords :
Focused Crawler; Genetic Algorithm; PageRank Algorithm; Web Information;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Wireless Mobile and Computing (CCWMC 2009), IET International Communication Conference on
Type :
conf
Filename :
5521971
Link To Document :
بازگشت