• DocumentCode
    3598445
  • Title

    Research of theme crawling strategy based on genetic algorithm

  • Author

    Yifeng, Chen ; Hengkai, Zhao ; Xiaoqing, Yu ; Wanggen, Wan

  • Author_Institution
    School of Communication And Information Engineering, Shanghai University, Shanghai, 200072
  • fYear
    2009
  • Firstpage
    472
  • Lastpage
    475
  • Abstract
    Aiming at the subject drifting problem of topic crawling, this paper presents a theme crawling strategy of web crawler. Based on Genetic Algorithm, this strategy absorbs PageRank algorithm and correlation of web page and theme, re-sets the fitness function and adjusts size of correlative parameters of calculation. In this way, superior gene individual is selected firstly and subject drifting problem is reduced. Compared with previous strategies based on genetic algorithm, the number of web pages relevant to the crawling subject can be raised more than 5%.
  • Keywords
    Focused Crawler; Genetic Algorithm; PageRank Algorithm; Web Information;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Wireless Mobile and Computing (CCWMC 2009), IET International Communication Conference on
  • Type

    conf

  • Filename
    5521971