Title :
ViQueL: A Spatial Query Language for Presentation-Oriented Documents
Author :
Ora, E. ; Riccetti, Francesco ; Ruffolo, Massimo
Author_Institution :
DEIS, Altilia srl, Univ. of Calabria, Rende, Italy
Abstract :
In last years the huge relevance of accessing and acquiring information made available by Web pages and business documents has grown much further. Thus, wrapping information from documents in HTML and PDF formats is receiving increasing interest. In this paper we present a textual query language, named ViQueL, that allows for querying information in both Web and PDF documents on the base of its spatial arrangement. The proposed language is founded on spatial grammars, i.e. context free grammars extended by spatial constructs. The main feature of ViQueL is that it make possible to identify and extract relevant information from HTML and PDF documents on the base of their visual appearance by using easy-to-write queries. Despite a considerable expressive power, combined complexity of ViQueL is in P-Time. Moreover, experiments show that ViQueL is reasonably efficient for real life extraction tasks.
Keywords :
hypermedia markup languages; query languages; HTML; PDF document; ViQueL; Web page; information acquiring; presentation oriented document; spatial arrangement; spatial query language; Data mining; Database languages; Grammar; HTML; Visualization; Web pages; Wrapping; Context Free Grammars; Information Extraction; Qualitative Spatial Reasoning; Wrapping;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on
Conference_Location :
Arras
Print_ISBN :
978-1-4244-8817-9
DOI :
10.1109/ICTAI.2010.121