DocumentCode :
3695251
Title :
A sequence learning approach for multiple script identification
Author :
Adnan Ul-Hasan;Muhammad Zeshan Afzal;Faisal Shafait;Marcus Liwicki;Thomas M. Breuel
Author_Institution :
Department of Computer Science, University of Kaiserslautern, Germany
fYear :
2015
Firstpage :
1046
Lastpage :
1050
Abstract :
In this paper, we present a novel methodology for multiple script identification using Long Short-Term Memory (LSTM) networks´ sequence-learning capabilities. Our method is able to identify multiple scripts at text-line level, where two or more scripts are present in the same text-line. Unlike traditional techniques, where either shape features or bounding boxes of individual characters are extracted, the LSTM-based system learns a particular script in a supervised learning framework. Moreover, this system neither needs specific features nor other preprocessing steps other than text-line extraction and text-line normalization. The proposed method works on text-line level, where it identifies each character as belonging to a particular script. We have developed a database consisting of English and Greek script, and our system achieved a script recognition accuracy of 98.186% on this dataset.
Keywords :
"Optical character recognition software","Radio frequency"
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/ICDAR.2015.7333921
Filename :
7333921
Link To Document :
بازگشت