DocumentCode :
178345
Title :
Transforming Web Tables to a Relational Database
Author :
Embley, D.W. ; Nagy, G. ; Seth, S.
Author_Institution :
Brigham Young Univ., Provo, UT, USA
fYear :
2014
fDate :
24-28 Aug. 2014
Firstpage :
2781
Lastpage :
2786
Abstract :
HTML tables represent a significant fraction of web data. The often complex headers of such tables are determined accurately using their indexing property. Isolated headers are factored to extract category hierarchies. Web tables are then transformed into a canonical form and imported into a relational database. The proposed processing allows for the formulation of arbitrary SQL queries over the collection of induced relational tables.
Keywords :
Internet; SQL; hypermedia markup languages; query processing; relational databases; HTML; Web tables; arbitrary SQL queries; category hierarchy extraction; complex headers; isolated headers; relational database; Classification algorithms; HTML; Indexing; Layout; Pattern recognition; Relational databases; Wang categories; header paths; relational table SQL queries; table segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
ISSN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2014.479
Filename :
6977192
Link To Document :
بازگشت