• DocumentCode
    3695251
  • Title

    A sequence learning approach for multiple script identification

  • Author

    Adnan Ul-Hasan;Muhammad Zeshan Afzal;Faisal Shafait;Marcus Liwicki;Thomas M. Breuel

  • Author_Institution
    Department of Computer Science, University of Kaiserslautern, Germany
  • fYear
    2015
  • Firstpage
    1046
  • Lastpage
    1050
  • Abstract
    In this paper, we present a novel methodology for multiple script identification using Long Short-Term Memory (LSTM) networks´ sequence-learning capabilities. Our method is able to identify multiple scripts at text-line level, where two or more scripts are present in the same text-line. Unlike traditional techniques, where either shape features or bounding boxes of individual characters are extracted, the LSTM-based system learns a particular script in a supervised learning framework. Moreover, this system neither needs specific features nor other preprocessing steps other than text-line extraction and text-line normalization. The proposed method works on text-line level, where it identifies each character as belonging to a particular script. We have developed a database consisting of English and Greek script, and our system achieved a script recognition accuracy of 98.186% on this dataset.
  • Keywords
    "Optical character recognition software","Radio frequency"
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDAR.2015.7333921
  • Filename
    7333921