DocumentCode
1633493
Title
Italic or Roman: Word Style Recognition without A Priori Knowledge for Old Printed Documents
Author
Eynard, Loris ; Emptoz, Hubert
Author_Institution
CNRS INSA-Lyon, Univ. de Lyon, Lyon, France
fYear
2009
Firstpage
823
Lastpage
827
Abstract
This paper presents an Italic/Roman word type recognition system without a priori knowledge on the characters´ font. This method aims at analyzing old documents in which character segmentation is not trivial. Therefore our approach segments the document into words and analyse the text word per word. To define the word style, we combine three criteria which are based on the visual differences between a word and a slanted version of the same word.These criteria are defined thanks to features computed from the vertical projection profile of the word. Because we do not assume a specific slant angle, we compute these measures on a whole range of possible slant angles and then sum the obtained scores. Our results show a ratio of 100% recognition for Italic words and 97.2% for Roman words.
Keywords
document handling; pattern recognition; text analysis; Italic-Roman word type recognition; document segmentation; old printed document; slant angle; text analysis; word style recognition; word vertical projection profile; Character recognition; Feature extraction; Histograms; Humans; Image segmentation; Ink; Optical character recognition software; Text analysis; Text recognition; Typesetting; Italic Recognition; old documents; segmentation-free; word style;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location
Barcelona
ISSN
1520-5363
Print_ISBN
978-1-4244-4500-4
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2009.176
Filename
5277521
Link To Document