DocumentCode :
1680691
Title :
ViQueL: A Spatial Query Language for Presentation-Oriented Documents
Author :
Ora, E. ; Riccetti, Francesco ; Ruffolo, Massimo
Author_Institution :
DEIS, Altilia srl, Univ. of Calabria, Rende, Italy
Volume :
2
fYear :
2010
Firstpage :
345
Lastpage :
346
Abstract :
In last years the huge relevance of accessing and acquiring information made available by Web pages and business documents has grown much further. Thus, wrapping information from documents in HTML and PDF formats is receiving increasing interest. In this paper we present a textual query language, named ViQueL, that allows for querying information in both Web and PDF documents on the base of its spatial arrangement. The proposed language is founded on spatial grammars, i.e. context free grammars extended by spatial constructs. The main feature of ViQueL is that it make possible to identify and extract relevant information from HTML and PDF documents on the base of their visual appearance by using easy-to-write queries. Despite a considerable expressive power, combined complexity of ViQueL is in P-Time. Moreover, experiments show that ViQueL is reasonably efficient for real life extraction tasks.
Keywords :
hypermedia markup languages; query languages; HTML; PDF document; ViQueL; Web page; information acquiring; presentation oriented document; spatial arrangement; spatial query language; Data mining; Database languages; Grammar; HTML; Visualization; Web pages; Wrapping; Context Free Grammars; Information Extraction; Qualitative Spatial Reasoning; Wrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on
Conference_Location :
Arras
ISSN :
1082-3409
Print_ISBN :
978-1-4244-8817-9
Type :
conf
DOI :
10.1109/ICTAI.2010.121
Filename :
5670086
Link To Document :
بازگشت