• DocumentCode
    2173202
  • Title

    Speeding-up Chinese character recognition in an automatic document reading system

  • Author

    Tseng, Yi-Hong ; Kuo, Chi-Chang ; Lee, Hsi-Jian

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • Volume
    2
  • fYear
    1997
  • fDate
    18-20 Aug 1997
  • Firstpage
    629
  • Abstract
    We present two techniques for speeding up character recognition. Our character recognition system, including the candidate cluster selection and detail matching modules, is implemented using two statistical features: crossing counts and contour direction counts. In the training stage, we divide characters into different clusters. To keep a very high recognition rate, the candidate cluster selection module selects the top 60 clusters with minimal distances from among 300 predefined clusters. To further speed up the recognition speed, we use a modified branch and bound algorithm in the detail matching module. In the automatic document reading system, characters and punctuation marks are first extracted from printed document images and sorted according to their positions and the document orientation. The system then recognizes all printed Chinese characters between pairs of punctuation marks. The results are then spoken aloud by a speech synthesis system
  • Keywords
    document image processing; natural languages; optical character recognition; speech synthesis; statistical analysis; tree searching; Chinese character recognition speed up; automatic document reading system; candidate cluster selection; character recognition system; contour direction counts; crossing counts; detail matching modules; document orientation; modified branch and bound algorithm; predefined clusters; printed document images; punctuation marks; speech synthesis system; statistical features; training stage; Character recognition; Clustering algorithms; Clustering methods; Computer science; Costs; Hazards; Optical character recognition software; Speech synthesis; Statistical analysis; Strips;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
  • Conference_Location
    Ulm
  • Print_ISBN
    0-8186-7898-4
  • Type

    conf

  • DOI
    10.1109/ICDAR.1997.620581
  • Filename
    620581