Title :
A topic-specific Web crawler based on content and structure mining
Author :
Rong Qian ; Kejun Zhang ; Geng Zhao
Author_Institution :
Dept. of Comput. Sci., Beijing Electron. Sci. & Technol. Inst., Beijing, China
Abstract :
This paper discusses a topic-specific intelligent Web crawler based on Web content and structure mining. The method takes advantage of the characteristics of the neural network and introduces the reinforcement learning to find the relativity between the crawled web pages and the topic. When calculating the correlation, we just select the important tags of HTML makeup of the Web page, to analyze the web page´s content and structure. The experiments show that our method improves the efficiency and accuracy clearly.
Keywords :
Internet; data mining; hypermedia markup languages; learning (artificial intelligence); neural nets; HTML makeup; Web crawler; Web page content mining; Web page structure mining; neural network; reinforcement learning; Crawlers; Data mining; Learning (artificial intelligence); Neural networks; Search engines; Uniform resource locators; Web pages; Topic-specific; crawling algorithm; reinforcement learning; web content and structure mining;
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
Conference_Location :
Dalian
DOI :
10.1109/ICCSNT.2013.6967153