Title :
Understanding mathematical expressions from document images
Author :
Ha, Jaekyu ; Haralick, Robert M. ; Phillips, Ihsin T.
Author_Institution :
Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
Abstract :
This paper proposes a system that understands mathematical expressions on binarized printed document images. The system first extracts a set of “primitives” from the image. Each of the extracted primitives is associated with a “bounding box” and its label. Using the attributes of the primitives, the system constructs an initial hierarchy. Construction of an initial hierarchy includes merging a group of primitives into a key word. Next, the system checks the validity of the hierarchy according to conventional mathematical syntax rules. If any syntax error is detected, the system makes attempts to correct the errors. The modification step includes reconfiguring the initial hierarchy, revisiting the original image for possible missing primitives, placing dummy primitives into missing spots in the hierarchy, and so on. The corrected hierarchical structure can be converted into the format for a particular publication system such as T EX
Keywords :
document image processing; binarized printed document images; bounding box; conventional mathematical syntax rules; corrected hierarchical structure; document images; dummy primitives; extracted primitives; mathematical expressions; Character recognition; Computer science; Error correction; Image converters; Image databases; Information retrieval; Merging; Optical character recognition software; Research and development; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.602060