Title :
Focused crawler URL analysis model based on improved genetic algorithm
Author :
Ning, Hui ; Wu, Hao ; He, Zhongzheng ; Tan, Yazhou
Author_Institution :
Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
Abstract :
This paper analyses the URL analysis models of the existing focused crawler, and also their pros and cons, then we propose a URL analysis model based on the improved genetic algorithm, in which the selection operator, crossover operator and mutation operator are optimized. The user query is introduced to construct the virtual documents to participate the genetic process. The Rocchio feedback learning algorithm is used to amend the theme vector, and also to compute the relevant degree of the themes for the anchor text. The experiment shows that the improved generic algorithm can effectively collect the topic page.
Keywords :
Internet; genetic algorithms; learning (artificial intelligence); mathematical operators; Rocchio feedback learning algorithm; URL analysis model; crossover operator; focused crawler; genetic algorithm; mutation operator; selection operator; Analytical models; Computational modeling; Crawlers; Genetic algorithms; Genetics; Search engines; Web pages; Focused Crawler; Genetic algorithm; URL analysis model;
Conference_Titel :
Mechatronics and Automation (ICMA), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8113-2
DOI :
10.1109/ICMA.2011.5986315