Title :
Identification and removal of extraneous graphics in a commercial OCR operation
Author :
Hashemi, Ray R. ; Epperson, Charlie ; Jones, Steve ; Jin, Lei ; Talburt, John
Author_Institution :
Comput. Sci. Dept., Arkansas Univ., Little Rock, AR, USA
Abstract :
The major issue in OCRing of a document that is composed of a mixture of text and graphics (i.e. a mixed document) is the presence of graphics in the document. In this research efforts we propose two algorithms for identification and removal of two special types of graphics, namely, company logos and graphic displays with broken boundaries. A prototype is built and its performance evaluated on a test set of 198 scanned images of mixed documents. The prototype was able to remove 100% of the two types of graphics from the images.
Keywords :
optical character recognition; broken boundaries; commercial OCR operation; company logos; extraneous graphics identification; extraneous graphics removal; mixed document; Computer graphics; Computer science; Displays; Image analysis; Image enhancement; Optical character recognition software; Pattern recognition; Prototypes; Testing; Text analysis;
Conference_Titel :
Automation Congress, 2002 Proceedings of the 5th Biannual World
Print_ISBN :
1-889335-18-5
DOI :
10.1109/WAC.2002.1049574