DocumentCode
3141237
Title
Extraction of type style based meta-information from imaged documents
Author
Garain, U. ; Chaudhuri, B.B.
Author_Institution
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
fYear
1999
fDate
20-22 Sep 1999
Firstpage
341
Lastpage
344
Abstract
Extraction of some meta-information from printed documents without an OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. Detection of these type styles helps in automatic extraction of the lines containing titles, authors´ names, subtitles, references as well as sentences having important terms occurring in the text. It also helps in improving the OCR performance for reading the italic text. Some experimental results on the performance of the approach on good quality as well as degraded document images are presented
Keywords
character sets; document image processing; document image processing; experimental results; line extraction; meta-information; printed documents; sentences; statistics; terms; type style information; Computer vision; Data mining; Image converters; Optical character recognition software; Pattern recognition; Postal services; Pressing; Read only memory; Search engines; Sections;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location
Bangalore
Print_ISBN
0-7695-0318-7
Type
conf
DOI
10.1109/ICDAR.1999.791794
Filename
791794
Link To Document