DocumentCode
2145685
Title
Classical Mongolian Words Recognition in Historical Document
Author
Gao, Guanglai ; Su, Xiangdong ; Wei, Hongxi ; Gong, Yeyun
Author_Institution
Sch. of Comput. Sci., Inner Mongolia Univ., Hohhot, China
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
692
Lastpage
697
Abstract
There are many classical Mongolian historical documents which are reserved in image form, and as a result it is difficult for us to explore and retrieve them. In this paper, we investigate the peculiarities of classical Mongolian documents and propose an approach to recognize the words in them. We design an algorithm to segment the Mongolian words into several Glyph Units(Glyph Unit abbr. GU). Each GU is consisted of no more than three characters. Then we used a three-stage method to recognize the GUs. At the first stage, all the GUs are classified into nine groups by decision tree using three features of the GUs. At the second stage, the GUs in each group are classified individually by five independent BP Neutral Networks whose inputs are other five feature vectors of the GUs. At the last stage, the five results of each GU group from the above five classifiers are combined to provide the final recognized result. The recognition rate of the Mongolian words in our experiment achieves 71%, indicating that our method is effective.
Keywords
backpropagation; decision trees; document image processing; history; image classification; image segmentation; information retrieval; natural language processing; neural nets; vectors; BP neutral networks; Mongolian words segementation; classical Mongolian historical documents; classical Mongolian words recognition; decision tree; feature vectors; image form; recognition rate; several glyph units; three-stage method; word recognition; Character recognition; Decision trees; Handwriting recognition; Image recognition; Image segmentation; Principal component analysis; Support vector machine classification; Classical Mongolian; Mongolian Segmentation; Multi-classifier Combination; off-line Handwritten Recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.145
Filename
6065400
Link To Document