DocumentCode
3486521
Title
Improving Formula Analysis with Line and Mathematics Identification
Author
Alkalai, Mohamed ; Baker, Josef B. ; Sorge, Volker ; Xiaoyan Lin
Author_Institution
Sch. of Comput. Sci., Univ. of Birmingham, Birmingham, UK
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
334
Lastpage
338
Abstract
The explosive growth of the internet and electronic publishing has led to a huge number of scientific documents being available to users, however, they are usually inaccessible to those with visual impairments and often only partially compatible with software and modern hardware such as tablets and e-readers. In this paper we revisit Maxtract, a tool for analysing and converting documents into accessible formats, and combine it with two advanced segmentation techniques, statistical line identification and machine learning formula identification. We show how these advanced techniques improve the quality of both Maxtract´s underlying document analysis and its output. We re-run and compare experimental results over a number of datasets, presenting a qualitative review of the improved output and drawing conclusions.
Keywords
document image processing; electronic publishing; image segmentation; learning (artificial intelligence); statistical analysis; Internet; Maxtract document analysis; e-readers; electronic publishing; electronic readers; formula analysis; machine learning formula identification; mathematics identification; segmentation techniques; statistical line identification; tablets; Accuracy; Feature extraction; Histograms; Layout; Portable document format; Text recognition; Math formula recognition; formula identification; line segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.74
Filename
6628639
Link To Document