• DocumentCode
    2145685
  • Title

    Classical Mongolian Words Recognition in Historical Document

  • Author

    Gao, Guanglai ; Su, Xiangdong ; Wei, Hongxi ; Gong, Yeyun

  • Author_Institution
    Sch. of Comput. Sci., Inner Mongolia Univ., Hohhot, China
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    692
  • Lastpage
    697
  • Abstract
    There are many classical Mongolian historical documents which are reserved in image form, and as a result it is difficult for us to explore and retrieve them. In this paper, we investigate the peculiarities of classical Mongolian documents and propose an approach to recognize the words in them. We design an algorithm to segment the Mongolian words into several Glyph Units(Glyph Unit abbr. GU). Each GU is consisted of no more than three characters. Then we used a three-stage method to recognize the GUs. At the first stage, all the GUs are classified into nine groups by decision tree using three features of the GUs. At the second stage, the GUs in each group are classified individually by five independent BP Neutral Networks whose inputs are other five feature vectors of the GUs. At the last stage, the five results of each GU group from the above five classifiers are combined to provide the final recognized result. The recognition rate of the Mongolian words in our experiment achieves 71%, indicating that our method is effective.
  • Keywords
    backpropagation; decision trees; document image processing; history; image classification; image segmentation; information retrieval; natural language processing; neural nets; vectors; BP neutral networks; Mongolian words segementation; classical Mongolian historical documents; classical Mongolian words recognition; decision tree; feature vectors; image form; recognition rate; several glyph units; three-stage method; word recognition; Character recognition; Decision trees; Handwriting recognition; Image recognition; Image segmentation; Principal component analysis; Support vector machine classification; Classical Mongolian; Mongolian Segmentation; Multi-classifier Combination; off-line Handwritten Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.145
  • Filename
    6065400