DocumentCode
3598445
Title
Research of theme crawling strategy based on genetic algorithm
Author
Yifeng, Chen ; Hengkai, Zhao ; Xiaoqing, Yu ; Wanggen, Wan
Author_Institution
School of Communication And Information Engineering, Shanghai University, Shanghai, 200072
fYear
2009
Firstpage
472
Lastpage
475
Abstract
Aiming at the subject drifting problem of topic crawling, this paper presents a theme crawling strategy of web crawler. Based on Genetic Algorithm, this strategy absorbs PageRank algorithm and correlation of web page and theme, re-sets the fitness function and adjusts size of correlative parameters of calculation. In this way, superior gene individual is selected firstly and subject drifting problem is reduced. Compared with previous strategies based on genetic algorithm, the number of web pages relevant to the crawling subject can be raised more than 5%.
Keywords
Focused Crawler; Genetic Algorithm; PageRank Algorithm; Web Information;
fLanguage
English
Publisher
iet
Conference_Titel
Wireless Mobile and Computing (CCWMC 2009), IET International Communication Conference on
Type
conf
Filename
5521971
Link To Document