مرکز منطقه ای اطلاع رساني علوم و فناوري - Script Identification

DocumentCode :

2499713

Title :

Script Identification – A Han and Roman Script Perspective

Author :

Chanda, Sukalpa ; Pal, Umapada ; Franke, Katrin ; Kimura, Fumitaka

Author_Institution :

Dept. of Comput. Sci. & Media Tech., Gjovik Univ. Coll., Gjovik, Norway

fYear :

2010

fDate :

23-26 Aug. 2010

Firstpage :

2708

Lastpage :

2711

Abstract :

All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese, Japanese, Korean, and Roman scripts, demands identification of scripts before execution of respective OCR modules. We propose a system to address this problem using directional features along with a Gaussian Kernel-based Support Vector Machine. We got promising results of 98.39% script identification accuracy at character level and 99.85% at block level, when no rejection was considered.

Keywords :

Gaussian processes; document image processing; image classification; natural languages; optical character recognition; support vector machines; Chinese script identification; Gaussian kernel; Han-based script identification; Japanese script identification; Korean script identification; SVM; multiscript OCR system; support vector machine; Accuracy; Feature extraction; Image segmentation; Kernel; Optical character recognition software; Support vector machines; Training; Document Analysis; Multi-script OCR; SVM; Script Identification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition (ICPR), 2010 20th International Conference on

Conference_Location :

Istanbul

ISSN :

1051-4651

Print_ISBN :

978-1-4244-7542-1

Type :

conf

DOI :

10.1109/ICPR.2010.1127

Filename :

5597017

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2499713