DocumentCode
178345
Title
Transforming Web Tables to a Relational Database
Author
Embley, D.W. ; Nagy, G. ; Seth, S.
Author_Institution
Brigham Young Univ., Provo, UT, USA
fYear
2014
fDate
24-28 Aug. 2014
Firstpage
2781
Lastpage
2786
Abstract
HTML tables represent a significant fraction of web data. The often complex headers of such tables are determined accurately using their indexing property. Isolated headers are factored to extract category hierarchies. Web tables are then transformed into a canonical form and imported into a relational database. The proposed processing allows for the formulation of arbitrary SQL queries over the collection of induced relational tables.
Keywords
Internet; SQL; hypermedia markup languages; query processing; relational databases; HTML; Web tables; arbitrary SQL queries; category hierarchy extraction; complex headers; isolated headers; relational database; Classification algorithms; HTML; Indexing; Layout; Pattern recognition; Relational databases; Wang categories; header paths; relational table SQL queries; table segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location
Stockholm
ISSN
1051-4651
Type
conf
DOI
10.1109/ICPR.2014.479
Filename
6977192
Link To Document