DocumentCode :
3695237
Title :
Can RNNs reliably separate script and language at word and line level?
Author :
Ajeet Kumar Singh;C. V. Jawahar
Author_Institution :
Center for Visual Information Technology, IIIT Hyderabad, India
fYear :
2015
Firstpage :
976
Lastpage :
980
Abstract :
In this work, we investigate the utility of Recurrent Neural Networks (RNNs) for script and language identification. Both these problems have been attempted in the past with representations computed from the distribution of connected components or characters (e.g. texture, n-gram). Often these features are computed from a larger segment (a paragraph or a page). We argue that one can predict the script or language with minimal evidence (e.g. given only a word or a line) very accurately with the help of a pre-trained RNN. We propose a simple and generic solution for the task of script and language identification which do not require any special tuning. Our method represents the word images as a sequence of feature vectors, and employ the RNNs for the identification. We verify the method on a large corpus of more than 15.03M words from 55K document images comprising 15 scripts and languages. We report an accurate script and language identification at word and line level.
Keywords :
"Reliability","Logic gates","Noise measurement"
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/ICDAR.2015.7333907
Filename :
7333907
Link To Document :
بازگشت