• DocumentCode
    2013343
  • Title

    An Overview of the Tesseract OCR Engine

  • Author

    Smith, Ray

  • Author_Institution
    Google Inc., Mountain View
  • Volume
    2
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    629
  • Lastpage
    633
  • Abstract
    The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.
  • Keywords
    image classification; optical character recognition; Tesseract OCR engine; UNLV; adaptive classifier; line finding; Filters; Independent component analysis; Inspection; Open source software; Optical character recognition software; Pipelines; Prototypes; Search engines; Testing; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4376991
  • Filename
    4376991