An Overview of the Tesseract OCR Engine

Author

Smith, Ray

Author_Institution

Google Inc., Mountain View

Volume

fYear

2007

fDate

23-26 Sept. 2007

Firstpage

629

Lastpage

633

Abstract

The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.

Keywords

image classification; optical character recognition; Tesseract OCR engine; UNLV; adaptive classifier; line finding; Filters; Independent component analysis; Inspection; Open source software; Optical character recognition software; Pipelines; Prototypes; Search engines; Testing; Text recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on

Conference_Location

Parana

ISSN

1520-5363

Print_ISBN

978-0-7695-2822-9

Type

conf

DOI

10.1109/ICDAR.2007.4376991

Filename

4376991

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2013343