DocumentCode :
3695140
Title :
Generation of synthetic training data for handwritten Indic script recognition
Author :
Shivansh Gaur;Siddhant Sonkar;Partha Pratim Roy
Author_Institution :
Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India
fYear :
2015
Firstpage :
491
Lastpage :
495
Abstract :
This paper presents a novel approach to create synthetic dataset for word recognition systems. Our purpose is to improve performance of off-line handwritten text recognizers by providing it with additional synthetic training data. Due to lack of proper data-set for many languages it becomes hard to train recognition systems. To solve such problems synthetic handwriting could be used to expand the existing training dataset. Any available digital data from online newspaper and such sources can be used to generate this synthetic data. The digital data is distorted in such a way that the underlying pattern is conserved for identification of the word by both machine and human user. The images hence produced can be used to train any classification system for handwriting recognition. This data can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the accuracy of the results. We experimented using only synthetic data obtaining high recognition accuracy in both character and word recognition. The data was tested on 3 Indian scripts for numerals- Hindi, Bengali and Telugu, and 1 script-Hindi for words, the results achieved hence are highly promising.
Keywords :
"Handwriting recognition","Image recognition","Accuracy","Trajectory","Distortion","Testing","Principal component analysis"
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/ICDAR.2015.7333810
Filename :
7333810
Link To Document :
بازگشت