• DocumentCode
    178345
  • Title

    Transforming Web Tables to a Relational Database

  • Author

    Embley, D.W. ; Nagy, G. ; Seth, S.

  • Author_Institution
    Brigham Young Univ., Provo, UT, USA
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    2781
  • Lastpage
    2786
  • Abstract
    HTML tables represent a significant fraction of web data. The often complex headers of such tables are determined accurately using their indexing property. Isolated headers are factored to extract category hierarchies. Web tables are then transformed into a canonical form and imported into a relational database. The proposed processing allows for the formulation of arbitrary SQL queries over the collection of induced relational tables.
  • Keywords
    Internet; SQL; hypermedia markup languages; query processing; relational databases; HTML; Web tables; arbitrary SQL queries; category hierarchy extraction; complex headers; isolated headers; relational database; Classification algorithms; HTML; Indexing; Layout; Pattern recognition; Relational databases; Wang categories; header paths; relational table SQL queries; table segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.479
  • Filename
    6977192