• DocumentCode
    3141237
  • Title

    Extraction of type style based meta-information from imaged documents

  • Author

    Garain, U. ; Chaudhuri, B.B.

  • Author_Institution
    Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
  • fYear
    1999
  • fDate
    20-22 Sep 1999
  • Firstpage
    341
  • Lastpage
    344
  • Abstract
    Extraction of some meta-information from printed documents without an OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. Detection of these type styles helps in automatic extraction of the lines containing titles, authors´ names, subtitles, references as well as sentences having important terms occurring in the text. It also helps in improving the OCR performance for reading the italic text. Some experimental results on the performance of the approach on good quality as well as degraded document images are presented
  • Keywords
    character sets; document image processing; document image processing; experimental results; line extraction; meta-information; printed documents; sentences; statistics; terms; type style information; Computer vision; Data mining; Image converters; Optical character recognition software; Pattern recognition; Postal services; Pressing; Read only memory; Search engines; Sections;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    0-7695-0318-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1999.791794
  • Filename
    791794