• DocumentCode
    1633493
  • Title

    Italic or Roman: Word Style Recognition without A Priori Knowledge for Old Printed Documents

  • Author

    Eynard, Loris ; Emptoz, Hubert

  • Author_Institution
    CNRS INSA-Lyon, Univ. de Lyon, Lyon, France
  • fYear
    2009
  • Firstpage
    823
  • Lastpage
    827
  • Abstract
    This paper presents an Italic/Roman word type recognition system without a priori knowledge on the characters´ font. This method aims at analyzing old documents in which character segmentation is not trivial. Therefore our approach segments the document into words and analyse the text word per word. To define the word style, we combine three criteria which are based on the visual differences between a word and a slanted version of the same word.These criteria are defined thanks to features computed from the vertical projection profile of the word. Because we do not assume a specific slant angle, we compute these measures on a whole range of possible slant angles and then sum the obtained scores. Our results show a ratio of 100% recognition for Italic words and 97.2% for Roman words.
  • Keywords
    document handling; pattern recognition; text analysis; Italic-Roman word type recognition; document segmentation; old printed document; slant angle; text analysis; word style recognition; word vertical projection profile; Character recognition; Feature extraction; Histograms; Humans; Image segmentation; Ink; Optical character recognition software; Text analysis; Text recognition; Typesetting; Italic Recognition; old documents; segmentation-free; word style;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.176
  • Filename
    5277521