• DocumentCode
    3409910
  • Title

    Multi-frame combination for robust videotext recognition

  • Author

    Prasad, Rohit ; Saleem, Shirin ; Macrostie, Ehry ; Natarajan, Prem ; Decerbo, Michael

  • Author_Institution
    BBN Technol., Cambridge, MA
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    1357
  • Lastpage
    1360
  • Abstract
    Optical character recognition (OCR) of overlaid text in video streams is a challenging problem due to various factors including the presence of dynamic backgrounds, color, and low resolution. In video feeds such as Broadcast News, a particular overlaid text region usually persists for multiple frames during which the background may or may not vary. In this paper we explore two innovative techniques that exploit such multi-frame persistence of videotext. The first technique uses multiple instances to generate a single enhanced image for recognition. The second technique uses the NIST ROVER algorithm developed for speech recognition to combine 1-best hypotheses from different frames of a text region. Significant improvement in the word error rate (WER) is obtained by using ROVER when compared to recognizing a single instance. The WER is further reduced by combining hypotheses from frame instances, which were generated using character models trained with different binarization thresholds. A 20% relative reduction in the WER was achieved for multi-frame combination over decoding a single frame instance.
  • Keywords
    optical character recognition; text analysis; video signal processing; 1-best hypotheses; multiframe combination; multiframe persistence; optical character recognition; overlaid text; robust videotext recognition; video streams; word error rate; Broadcasting; Character recognition; Feeds; Image recognition; Multimedia communication; NIST; Optical character recognition software; Robustness; Speech recognition; Streaming media; Hidden Markov Models; Optical Character Recognition; Videotext;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4517870
  • Filename
    4517870