مرکز منطقه ای اطلاع رساني علوم و فناوري - Generation of synthetic training data for handwritten Indic script recognition

DocumentCode :

3695140

Title :

Generation of synthetic training data for handwritten Indic script recognition

Author :

Shivansh Gaur;Siddhant Sonkar;Partha Pratim Roy

Author_Institution :

Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India

fYear :

2015

Firstpage :

491

Lastpage :

495

Abstract :

This paper presents a novel approach to create synthetic dataset for word recognition systems. Our purpose is to improve performance of off-line handwritten text recognizers by providing it with additional synthetic training data. Due to lack of proper data-set for many languages it becomes hard to train recognition systems. To solve such problems synthetic handwriting could be used to expand the existing training dataset. Any available digital data from online newspaper and such sources can be used to generate this synthetic data. The digital data is distorted in such a way that the underlying pattern is conserved for identification of the word by both machine and human user. The images hence produced can be used to train any classification system for handwriting recognition. This data can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the accuracy of the results. We experimented using only synthetic data obtaining high recognition accuracy in both character and word recognition. The data was tested on 3 Indian scripts for numerals- Hindi, Bengali and Telugu, and 1 script-Hindi for words, the results achieved hence are highly promising.

Keywords :

"Handwriting recognition","Image recognition","Accuracy","Trajectory","Distortion","Testing","Principal component analysis"

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition (ICDAR), 2015 13th International Conference on

Type :

conf

DOI :

10.1109/ICDAR.2015.7333810

Filename :

7333810

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3695140