DocumentCode
3695177
Title
Language identification from handwritten documents
Author
Luc Mioulet;Utpal Garain;Clément Chatelain;Philippine Barlas;Thierry Paquet
Author_Institution
Laboratoire LITIS - EA 4108, Universite de Rouen, FRANCE 76800
fYear
2015
Firstpage
676
Lastpage
680
Abstract
This paper presents a novel approach for language identification in handwritten documents. The approach is based on script identification followed by character recognition. BLSTM-CTC based handwriting recognizers are used and the OCR output is fed to a statistical language identifier for detecting the language of the input handwritten document. Documents in two scripts (Latin and Bengali) and four languages (English, French, Bengali and Assamese) are considered for evaluation. Several alternative frameworks have been explored, effects of handwriting recognition and text length on language detection have been studied. It is observed that with some empirical restrictions it is very much possible to achieve more that 80% language detection accuracy and based on the current research practical systems can be designed.
Keywords
"Optical character recognition software","Error analysis","Shape","Java"
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type
conf
DOI
10.1109/ICDAR.2015.7333847
Filename
7333847
Link To Document