Title :
Numerical sequence extraction in handwritten incoming mail documents
Author :
Koch, G. ; Heutte, L. ; Paquet, T.
Author_Institution :
Lab. PSI, Univ. de Rouen, France
Abstract :
In this communication, we propose a method for the automatic extraction of numerical fields in handwritten documents. The approach exploits the known syntactic structure of the numerical field to extract, combined with a set of contextual morphological features to find the best label to each connected component. Applying an HMM based syntactic analyzer on the overall document allows to localize/extract fields of interest. Reported results on the extraction of zip codes, phone numbers and customer codes from handwritten incoming mail documents demonstrate the interest of the proposed approach.
Keywords :
computational linguistics; feature extraction; handwritten character recognition; hidden Markov models; mailing systems; HMM based syntactic analyzer; automatic extraction; contextual morphological features; customer codes; handwritten documents; handwritten incoming mail documents; numerical fields; numerical sequence extraction; phone numbers; syntactic structure; zip codes; Context; Data mining; Dispatching; Face recognition; Filters; Handwriting recognition; Hidden Markov models; Labeling; Postal services; Text processing;
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
DOI :
10.1109/ICDAR.2003.1227691