• DocumentCode
    3695177
  • Title

    Language identification from handwritten documents

  • Author

    Luc Mioulet;Utpal Garain;Clément Chatelain;Philippine Barlas;Thierry Paquet

  • Author_Institution
    Laboratoire LITIS - EA 4108, Universite de Rouen, FRANCE 76800
  • fYear
    2015
  • Firstpage
    676
  • Lastpage
    680
  • Abstract
    This paper presents a novel approach for language identification in handwritten documents. The approach is based on script identification followed by character recognition. BLSTM-CTC based handwriting recognizers are used and the OCR output is fed to a statistical language identifier for detecting the language of the input handwritten document. Documents in two scripts (Latin and Bengali) and four languages (English, French, Bengali and Assamese) are considered for evaluation. Several alternative frameworks have been explored, effects of handwriting recognition and text length on language detection have been studied. It is observed that with some empirical restrictions it is very much possible to achieve more that 80% language detection accuracy and based on the current research practical systems can be designed.
  • Keywords
    "Optical character recognition software","Error analysis","Shape","Java"
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDAR.2015.7333847
  • Filename
    7333847