Title :
Research and design of the crawler system in a vertical search engine
Author :
Li, Min ; Zhao, Jun ; Huang, Tinglei
Author_Institution :
Sch. of Comput. Sci., Yangtze Univ., Jingzhou, China
Abstract :
The crawler system in a vertical search engine should format a representative sample web page so at to make sure that the page could meet the W3C standards, which make it available that the processed page can be resolved by the visual XPath generator and then the desired XPath value will be found out. In batch-data-extraction, some exact data will be available when object web pages are parsed by the crawler system. A vertical search engine can extract the necessary data and segment Chinese words at first, and then the data will be presented on web pages. The data structuring process after the data extraction distinguishes a vertical search engine from a traditional search engine. The crawler system that can extract professional information on the Internet and process the information preliminarily is an indispensable part of a vertical search engine.
Keywords :
Internet; search engines; Chinese words; Internet; W3C standards; Web page; batch-data-extraction; crawler system; data extraction; data structuring process; vertical search engine; visual XPath generator; Artificial intelligence; Educational institutions; Engines; Search engines; Standards; Vertical search engine; crawler system; data extraction; data structuring process; visual XPath generator; word segmentation;
Conference_Titel :
Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference on
Conference_Location :
Guilin
Print_ISBN :
978-1-4244-6834-8
DOI :
10.1109/ICISS.2010.5657110