• DocumentCode
    3695237
  • Title

    Can RNNs reliably separate script and language at word and line level?

  • Author

    Ajeet Kumar Singh;C. V. Jawahar

  • Author_Institution
    Center for Visual Information Technology, IIIT Hyderabad, India
  • fYear
    2015
  • Firstpage
    976
  • Lastpage
    980
  • Abstract
    In this work, we investigate the utility of Recurrent Neural Networks (RNNs) for script and language identification. Both these problems have been attempted in the past with representations computed from the distribution of connected components or characters (e.g. texture, n-gram). Often these features are computed from a larger segment (a paragraph or a page). We argue that one can predict the script or language with minimal evidence (e.g. given only a word or a line) very accurately with the help of a pre-trained RNN. We propose a simple and generic solution for the task of script and language identification which do not require any special tuning. Our method represents the word images as a sequence of feature vectors, and employ the RNNs for the identification. We verify the method on a large corpus of more than 15.03M words from 55K document images comprising 15 scripts and languages. We report an accurate script and language identification at word and line level.
  • Keywords
    "Reliability","Logic gates","Noise measurement"
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDAR.2015.7333907
  • Filename
    7333907