Title :
Transforming Web Tables to a Relational Database
Author :
Embley, D.W. ; Nagy, G. ; Seth, S.
Author_Institution :
Brigham Young Univ., Provo, UT, USA
Abstract :
HTML tables represent a significant fraction of web data. The often complex headers of such tables are determined accurately using their indexing property. Isolated headers are factored to extract category hierarchies. Web tables are then transformed into a canonical form and imported into a relational database. The proposed processing allows for the formulation of arbitrary SQL queries over the collection of induced relational tables.
Keywords :
Internet; SQL; hypermedia markup languages; query processing; relational databases; HTML; Web tables; arbitrary SQL queries; category hierarchy extraction; complex headers; isolated headers; relational database; Classification algorithms; HTML; Indexing; Layout; Pattern recognition; Relational databases; Wang categories; header paths; relational table SQL queries; table segmentation;
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
DOI :
10.1109/ICPR.2014.479