DocumentCode :
2144505
Title :
Comparing Approaches to Mathematical Document Analysis from PDF
Author :
Baker, Josef B. ; Sexton, Alan P. ; Sorge, Volker ; Suzuki, Masakazu
Author_Institution :
Sch. of Comput. Sci., Univ. of Birmingham, Birmingham, UK
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
463
Lastpage :
467
Abstract :
Document analysis of mathematical texts is a challenging problem even for born-digital documents in standard formats. We present alternative approaches addressing this problem in the context of PDF documents. One uses an OCR approach for character recognition together with a virtual link network for structural analysis. The other uses direct extraction of symbol information from the PDF file with a two stage parser to extract layout and expression structures. With reference to ground truth data, we compare the effectiveness and accuracy of the two techniques quantitatively with respect to character identification and structural analysis of mathematical expressions and qualitatively with respect to layout analysis.
Keywords :
document image processing; mathematics computing; optical character recognition; OCR approach; PDF documents; born digital documents; character recognition; expression structures; mathematical document analysis; mathematical texts; structural analysis; symbol information extraction; two stage parser; virtual link network; Character recognition; Data mining; Layout; Mathematics; Optical character recognition software; Portable document format; White spaces; Math formula recognition; PDF; layout analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.99
Filename :
6065354
Link To Document :
بازگشت