Title :
An Overview of the Tesseract OCR Engine
Author_Institution :
Google Inc., Mountain View
Abstract :
The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.
Keywords :
image classification; optical character recognition; Tesseract OCR engine; UNLV; adaptive classifier; line finding; Filters; Independent component analysis; Inspection; Open source software; Optical character recognition software; Pipelines; Prototypes; Search engines; Testing; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
Print_ISBN :
978-0-7695-2822-9
DOI :
10.1109/ICDAR.2007.4376991