Title :
Finding Critical Cells in Web Tables with SRL: Trying to Uncover the Devil´s Tease
Author :
Di Mauro, Nicola ; Esposito, Floriana ; Ferilli, Stefano
Author_Institution :
Dipt. di Inf., Univ. of Bari, Bari, Italy
Abstract :
Tables are extremely important components of documents, because they bear very informative content in a compact and structured way. Being able to understand a table´s internal organization would allow to extract and reuse the data they contain. This can be reduced to recognizing critical cells only. Since purely algorithmic approaches are unable to deal with the many different table layouts designed to represent particular kinds of information and/or particular perspectives on them, Machine Learning may represent an effective solution. On one hand, the spatial organization of tables puts a strong emphasis on the relationships among cells, on the other, the extreme variability in style, size, and aims of tables requires flexible approaches. This paper proposes the exploitation of a Statistical Relational Learning approach, that is able to model the complex spatial relationships involved in a table structure, by mixing the power of a relational representation formalism with the flexibility of a statistical learning tool. Experiments on a real-world dataset are reported both for single cell classification and for overall table structure recognition, whose results prove the validity of the proposed approach.
Keywords :
Internet; information retrieval; learning (artificial intelligence); relational databases; statistical analysis; text analysis; SRL approach; Web tables; compact informative content; complex spatial relationships; critical cell recognition; machine learning; real-world dataset; relational representation formalism; spatial table organization; statistical learning tool flexibility; statistical relational learning approach; structured informative content; table internal organization; table layouts; table structure recognition; Accuracy; High definition video; Layout; Probabilistic logic; Text analysis; Training;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.180