DocumentCode
183345
Title
Writing Type and Language Identification in Heterogeneous and Complex Documents
Author
Hebert, Dave ; Barlas, Panagiotis ; Chatelain, C. ; Adam, S. ; Paquet, T.
Author_Institution
Lab. LITIS - EA 4108, Univ. de Rouen, Rouen, France
fYear
2014
fDate
1-4 Sept. 2014
Firstpage
411
Lastpage
416
Abstract
This paper presents a system dedicated to automatic recognition of both the writing type and the language of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text, in various languages (French, English and Arabic). To handle such a problem, we divided it into two sub-tasks: The writing type identification and the language identification. The method for the writing type recognition is based on the analysis of the connected components while the language identification approach combines the analysis of connected components and the analysis of character distributions. We present the results obtained by the system during the second competition round of the MAURDOR campaign, and show that the performance of our system compares favorably with other participants.
Keywords
document image processing; handwritten character recognition; natural language processing; text analysis; character distributions; component analysis; document text regions; language identification; writing type identification; Feature extraction; Hidden Markov models; Measurement; Optical character recognition software; Shape; Text recognition; Writing; character distribution; codebook; document processing; language identification; writing type identification;
fLanguage
English
Publisher
ieee
Conference_Titel
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location
Heraklion
ISSN
2167-6445
Print_ISBN
978-1-4799-4335-7
Type
conf
DOI
10.1109/ICFHR.2014.75
Filename
6981054
Link To Document