DocumentCode :
2146685
Title :
Embedding a Mathematical OCR Module into OCRopus
Author :
Yamazaki, Shinpei ; Furukori, Fumihiro ; Zhao, Qinzheng ; Shirai, Keiichiro ; Okamoto, Masayuki
Author_Institution :
Fac. of Eng., Shinshu Univ., Nagano, Japan
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
880
Lastpage :
884
Abstract :
This paper describes embedding a mathematical formula recognition module into the OCR system OCRopus aiming at developing a OCR system for scientific and technical documents which include mathematical formulas. OCRopus is a open source OCR system emphasizing modularity, easy extensibility, and reuse. This system has several basic components such as preprocessing, layout analysis, and text line recognition, so it is a challenging project to embed the mathematical formula recognition module into the OCRopus system. We have developed the math OCR module, then report how to embed our module into the OCRopus system in order to realize a math OCR which can deal with wide variety of documents including mathematical formulas.
Keywords :
document image processing; mathematics computing; optical character recognition; text analysis; OCRopus; layout analysis; mathematical OCR module; mathematical formula recognition; optical character recognition; technical documents; text line recognition; Character recognition; Image recognition; Layout; Optical character recognition software; Text recognition; Training; White spaces; Mathematical formula recognition; OCR; OCRopus;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.180
Filename :
6065437
Link To Document :
بازگشت