DocumentCode :
2499713
Title :
Script Identification – A Han and Roman Script Perspective
Author :
Chanda, Sukalpa ; Pal, Umapada ; Franke, Katrin ; Kimura, Fumitaka
Author_Institution :
Dept. of Comput. Sci. & Media Tech., Gjovik Univ. Coll., Gjovik, Norway
fYear :
2010
fDate :
23-26 Aug. 2010
Firstpage :
2708
Lastpage :
2711
Abstract :
All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese, Japanese, Korean, and Roman scripts, demands identification of scripts before execution of respective OCR modules. We propose a system to address this problem using directional features along with a Gaussian Kernel-based Support Vector Machine. We got promising results of 98.39% script identification accuracy at character level and 99.85% at block level, when no rejection was considered.
Keywords :
Gaussian processes; document image processing; image classification; natural languages; optical character recognition; support vector machines; Chinese script identification; Gaussian kernel; Han-based script identification; Japanese script identification; Korean script identification; SVM; multiscript OCR system; support vector machine; Accuracy; Feature extraction; Image segmentation; Kernel; Optical character recognition software; Support vector machines; Training; Document Analysis; Multi-script OCR; SVM; Script Identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
ISSN :
1051-4651
Print_ISBN :
978-1-4244-7542-1
Type :
conf
DOI :
10.1109/ICPR.2010.1127
Filename :
5597017
Link To Document :
بازگشت