DocumentCode :
1645554
Title :
Text block recognition from TIFF images
Author :
Lovegrove, William ; Elliman, David
Author_Institution :
Nottingham Univ., UK
fYear :
1995
fDate :
11/2/1995 12:00:00 AM
Firstpage :
42461
Lastpage :
42466
Abstract :
The reproduction of a scanned document should include not only the optical character recognition of text, but also the structure of that text on the page and the appearance of that text itself (i.e. font recognition). This is paper presents an algorithm which structurally recognises the text of a page image. The method is based upon the “Docstrum plot” algorithm by L.O´Gorman (1993). Modifications have been made to O´Gorman´s algorithm which render very good results at identifying paragraphs and lines in particular. The algorithm implementation can, to a limited degree, describe the logical relationship of the text elements of the original page. The limitations of the algorithm are due to the lack of information available without OCR and font technology incorporated into the algorithm implementation. The algorithm implementation has a graphical interface which portrays the state of the algorithm during the process of decomposition
Keywords :
document image processing; optical character recognition; pattern recognition; Docstrum plot algorithm; OCR; TIFF image; algorithm; font recognition; graphical interface; optical character recognition; page image; page layout; pattern recognition; scanned document; text block recognition; text recognition; text structure;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Document Image Processing and Multimedia Environments, IEE Colloquium on
Conference_Location :
London
Type :
conf
DOI :
10.1049/ic:19951185
Filename :
498878
Link To Document :
بازگشت