Title :
Research on Mathematical Formulas Extraction from Chinese Document
Author :
Tian, Xuedong ; Zhang, Liping ; Li, Haiyan ; Shi, Qingxuan
Author_Institution :
Intelligent Image & Document Process. Lab, Hebei Univ., Baoding
Abstract :
A new approach for separating mathematics from usual text is presented. Contrary to the existing researches, it is aiming at Chinese mathematical documents and more oriented toward the segmentation than the recognition, separating the formulas outside and inside the text lines. The approach is composed of Parzen windows and Bayes theorem. Improved Parzen windows is used to extract the isolated formulas from the printed documents and Bayes theorem is used to extract the embedded formulas from the text lines. Experiments show that the combination of the two methods can achieve quite satisfactory result
Keywords :
Bayes methods; document image processing; feature extraction; image segmentation; Bayes theorem; Chinese mathematical document; Parzen window; mathematical formula extraction; Books; Character recognition; High speed optical techniques; Image converters; Mathematics; Optical character recognition software; Text recognition; Typesetting; Bayes theorem; Embedded formulas; Formula extraction; Isolated formulas;
Conference_Titel :
Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on
Conference_Location :
Dalian
Print_ISBN :
1-4244-0332-4
DOI :
10.1109/WCICA.2006.1713196