• DocumentCode
    2771272
  • Title

    Performance of Document Image OCR Systems for Recognizing Video Texts on Embedded Platform

  • Author

    Chattopadhyay, Tanushyam ; Sinha, Priyanka ; Biswas, Provat

  • Author_Institution
    Innovation Labs. Tata Consultancy Services Ltd., Kolkata, India
  • fYear
    2011
  • fDate
    7-9 Oct. 2011
  • Firstpage
    606
  • Lastpage
    610
  • Abstract
    Market demand for an embedded realization of video OCR motivated the authors to exert an attempt to evaluate the performance of existing document image OCR techniques for the same. Thus authors have tried to port the open source OCR systems like GOCR and Tessaract on an embedded platform. But their performance on an embedded platform shows that the character level and word level recognition accuracy is quite unacceptable for video text. This paper compares two such open source OCR systems on Indian TV videos and proposes some techniques that can be used to improve the recognition accuracy from 62% to 93%. Moreover the challenges of porting those codes on an embedded platform is also analyzed in this paper.
  • Keywords
    embedded systems; optical character recognition; text analysis; video signal processing; GOCR; Indian TV video; Tessaract; character level recognition; document image OCR system; document image OCR technique; embedded platform; open source OCR system; video OCR; video text recognition; word level recognition; Accuracy; Character recognition; Engines; Optical character recognition software; Streaming media; Text recognition; ABBYY; Findreader; GOCR; OCR; Tesseract; video;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Communication Networks (CICN), 2011 International Conference on
  • Conference_Location
    Gwalior
  • Print_ISBN
    978-1-4577-2033-8
  • Type

    conf

  • DOI
    10.1109/CICN.2011.131
  • Filename
    6112941