DocumentCode :
2784159
Title :
Focused crawler URL analysis model based on improved genetic algorithm
Author :
Ning, Hui ; Wu, Hao ; He, Zhongzheng ; Tan, Yazhou
Author_Institution :
Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
fYear :
2011
fDate :
7-10 Aug. 2011
Firstpage :
2159
Lastpage :
2164
Abstract :
This paper analyses the URL analysis models of the existing focused crawler, and also their pros and cons, then we propose a URL analysis model based on the improved genetic algorithm, in which the selection operator, crossover operator and mutation operator are optimized. The user query is introduced to construct the virtual documents to participate the genetic process. The Rocchio feedback learning algorithm is used to amend the theme vector, and also to compute the relevant degree of the themes for the anchor text. The experiment shows that the improved generic algorithm can effectively collect the topic page.
Keywords :
Internet; genetic algorithms; learning (artificial intelligence); mathematical operators; Rocchio feedback learning algorithm; URL analysis model; crossover operator; focused crawler; genetic algorithm; mutation operator; selection operator; Analytical models; Computational modeling; Crawlers; Genetic algorithms; Genetics; Search engines; Web pages; Focused Crawler; Genetic algorithm; URL analysis model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mechatronics and Automation (ICMA), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
2152-7431
Print_ISBN :
978-1-4244-8113-2
Type :
conf
DOI :
10.1109/ICMA.2011.5986315
Filename :
5986315
Link To Document :
بازگشت