• DocumentCode
    153371
  • Title

    OCR Performance Prediction Using a Bag of Allographs and Support Vector Regression

  • Author

    Bhowmik, Tapan Kumar ; Paquet, T. ; Ragot, N.

  • Author_Institution
    LITIS EA-4108, Univ. de Rouen, Rouen, France
  • fYear
    2014
  • fDate
    7-10 April 2014
  • Firstpage
    202
  • Lastpage
    206
  • Abstract
    In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.
  • Keywords
    document image processing; optical character recognition; regression analysis; support vector machines; OCR performance prediction; SVR technique; bag of allographs; historical documents; standard deviation; support vector regression; system performance; textual components; Accuracy; Buildings; Image edge detection; Libraries; Optical character recognition software; Training; Vectors; Bag of Allographs; Historical Documents; OCR; OCR Performance Prediction; Support Vector Regression (SVR); Template Matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
  • Conference_Location
    Tours
  • Print_ISBN
    978-1-4799-3243-6
  • Type

    conf

  • DOI
    10.1109/DAS.2014.72
  • Filename
    6830998