DocumentCode
3489426
Title
An OCR System with OCRopus for Scientific Documents Containing Mathematical Formulas
Author
Furukori, F. ; Yamazaki, Shumpei ; Miyagishi, T. ; Shirai, Keigo ; Okamoto, Mitsuo
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
1175
Lastpage
1179
Abstract
This paper describes the installation of a mathematical formula recognition module into an open source OCR system: OCRopus. In particular we consider the identification of inline formulas utilizing existing modules. Text lines including math formulas are first processed using a N-gram language model to reduce the number of formula candidates by thresholding the conditional probability of words. Then the formula candidates are classified into formulas and texts by SVM using geometric features associated with the bounding boxes of symbols.
Keywords
document image processing; geometry; optical character recognition; probability; support vector machines; OCRopus; SVM; conditional probability; geometric features; mathematical formula recognition module; n-gram language model; open source OCR system; scientific documents; text lines; Accuracy; Image recognition; Layout; Mathematical model; Optical character recognition software; Support vector machines; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.238
Filename
6628799
Link To Document