Title :
Embedding a Mathematical OCR Module into OCRopus
Author :
Yamazaki, Shinpei ; Furukori, Fumihiro ; Zhao, Qinzheng ; Shirai, Keiichiro ; Okamoto, Masayuki
Author_Institution :
Fac. of Eng., Shinshu Univ., Nagano, Japan
Abstract :
This paper describes embedding a mathematical formula recognition module into the OCR system OCRopus aiming at developing a OCR system for scientific and technical documents which include mathematical formulas. OCRopus is a open source OCR system emphasizing modularity, easy extensibility, and reuse. This system has several basic components such as preprocessing, layout analysis, and text line recognition, so it is a challenging project to embed the mathematical formula recognition module into the OCRopus system. We have developed the math OCR module, then report how to embed our module into the OCRopus system in order to realize a math OCR which can deal with wide variety of documents including mathematical formulas.
Keywords :
document image processing; mathematics computing; optical character recognition; text analysis; OCRopus; layout analysis; mathematical OCR module; mathematical formula recognition; optical character recognition; technical documents; text line recognition; Character recognition; Image recognition; Layout; Optical character recognition software; Text recognition; Training; White spaces; Mathematical formula recognition; OCR; OCRopus;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.180