Title :
Design and Implementation of a Web Information Extraction System Based on R-G-B Algorithm
Author :
Li, Yaoguo ; Sun, Huiye ; Lin, Shan ; Zhu, Mingying
Author_Institution :
Collage of Software, Nankai Univ., Tianjin
Abstract :
With the enormous growth of the World Wide Web in recent years, the issue of how to extract information from web pages efficiently, accurately and flexibly has become an important challenge for web crawler designers. Different from many other approaches, "R-G-B" algorithm is a new algorithm, which can well meet the requirement of search engines on the accuracy and the efficiency of information extraction. In this paper, we describe the design and implementation of a web information extraction system module which is based on the algorithm. We present the architecture of the system and report preliminary experimental results to prove that the system can address the issue of robustness, flexibility and accuracy at a low cost.
Keywords :
Web sites; information retrieval; search engines; R-G-B algorithm; Web information extraction system; Web pages; World Wide Web; search engines; Algorithm design and analysis; Costs; Crawlers; Data mining; Hidden Markov models; Robustness; Search engines; Web mining; Web pages; Web sites; Design and Implementation; Information Extraction; R-G-B Algorithm; Web Crawler;
Conference_Titel :
Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3497-8
DOI :
10.1109/IITA.2008.388