Title :
An architecture for web information agents
Author :
Sleiman, Hassan A. ; Corchuelo, Rafael
Author_Institution :
ETSI Inf., Univ. of Seville, Sevilla, Spain
Abstract :
Many authors are researching on information extraction techniques to transform the semi-structured information in typical web pages into structured information. When a researcher devises a new technique, he or she has to validate it, which requires implementing it, experimenting, gathering precision and recall results, comparing it to others, and drawing conclusions. This involves an array of details that are specific to this technique, but many others that are actually shared with other proposals. Unfortunately, the literature does not provide a single up-to-date platform to guide software engineers and researches in the design and implementation of information extractors. In this paper, we present a platform to design and implement learners of information extraction rules. Due to space constraints, we focus on the class of learners that learn hierarchical transducers. We have implemented our platform, and we have validated it by means of three case studies.
Keywords :
Internet; information retrieval; learning (artificial intelligence); software agents; software engineering; Web information agents; Web pages; hierarchical transducers; information extraction techniques; learning; semi-structured information; software engineers; Data mining; Indexes; Particle separators; Proposals; Software; Transducers; Web pages; Information extraction; learning rules;
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
Conference_Location :
Cordoba
Print_ISBN :
978-1-4577-1676-8
DOI :
10.1109/ISDA.2011.6121624