Title :
Refinement of digitized documents through recognition of mathematical formulae
Author :
Kanahori, Toshihiro ; Suzuki, Masakazu
Author_Institution :
Res. & Support Center on Higher Educ. for the Hearing & Visually Impaired, Tsukuba Univ. of Technol., Ibaraki
Abstract :
We are developing a recognition system, named ´Infty´, for scientific documents including those with mathematical formulae. In this paper, we propose a new system that can refine a text embedded PDF document recognizing the PDF as images and integrating its text information into the recognition results of Infty. This system can be combined with other OCR systems that output recognition results as text embedded in a PDF document. Using this system, mathematical information can be added to books, journals and papers in existing digital libraries. We evaluate effects of this system, comparing its recognition rates with those of ABBYY FineReader. The evaluation shows that this system can add mathematical information to PDF documents generated by FineReader without loss of quality of the ordinary text parts
Keywords :
document image processing; mathematics computing; optical character recognition; Infty; OCR systems; digital libraries; digitized document refinement; mathematical formulae recognition; mathematical information; recognition system; scientific documents; text embedded PDF document; Auditory system; Books; Chemicals; Educational technology; Error correction; Image recognition; Mathematics; Optical character recognition software; Software libraries; Text recognition;
Conference_Titel :
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location :
Lyon
Print_ISBN :
0-7695-2531-8
DOI :
10.1109/DIAL.2006.35