Title :
Stochastic Segment Modeling for Offline Handwriting Recognition
Author :
Natarajan, Prem ; Subramanian, Krishna ; Bhardwaj, Anurag ; Prasad, Rohit
Author_Institution :
BBN Technol., Cambridge, MA, USA
Abstract :
In this paper, we present a novel approach for incorporating structural information into the hidden Markov modeling (HMM) framework for offline handwriting recognition. Traditionally, structural features have been used in recognition approaches that rely on accurate segmentation of words into smaller units (sub-words or characters). However, such segmentation based approaches do not perform well on real-world handwritten images, because breaks and merges in glyphs typically create new connected components that are not observed in the training data. To mitigate the problem of having to derive accurate segmentation from connected components, we present a novel framework where the HMM based recognition system trained on shorter-span features is used to generate the 2D character images (the ldquostochastic segmentsrdquo), and then another classifier that uses structural features extracted from the stochastic character segments generates a new set of scores. Finally, the scores from the HMM system and from structural matching are used in combination to generate a hypothesis that is better than the results from either the HMM or from structural matching alone. We demonstrate the efficacy of our approach by reporting experimental results on a large corpus of handwritten Arabic documents.
Keywords :
feature extraction; handwriting recognition; handwritten character recognition; hidden Markov models; image classification; image matching; image segmentation; learning (artificial intelligence); text analysis; HMM; connected component; handwritten Arabic document; handwritten character image; hidden Markov modeling; image classification; image segmentation; machine learning; offline handwriting recognition; stochastic character segment; structural feature extraction; structural matching; text analysis; Character generation; Character recognition; Data mining; Feature extraction; Handwriting recognition; Hidden Markov models; Image recognition; Image segmentation; Stochastic processes; Training data; Hidden Markov Models; Optical Character Recognition; Stochastic Segment Modeling;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.278