• DocumentCode
    3021388
  • Title

    Language identification of character images using machine learning techniques

  • Author

    Liu, Ying-Ho ; Lin, Chin-Chin ; Chang, Fu

  • Author_Institution
    Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
  • fYear
    2005
  • fDate
    29 Aug.-1 Sept. 2005
  • Firstpage
    630
  • Abstract
    In this paper, we propose a new approach for identifying the language type of character images. We do this by classifying individual character images to determine the language boundaries in multilingual documents. Two effective methods are considered for this purpose: the prototype classification method and support vector machines (SVM). Due to the large size of our training data set, we further propose a technique to speed up the training process for both methods. Applying the two methods to classifying characters into Chinese, English, and Japanese (including Hiragana and Katakana) has produced very accurate and comparable test results.
  • Keywords
    character recognition; document image processing; image classification; natural languages; support vector machines; character images; classification method; language identification; machine learning; multilingual documents; support vector machines; Character recognition; Image analysis; Machine learning; Natural languages; Shape; Support vector machine classification; Support vector machines; Testing; Text analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
  • ISSN
    1520-5263
  • Print_ISBN
    0-7695-2420-6
  • Type

    conf

  • DOI
    10.1109/ICDAR.2005.149
  • Filename
    1575621