• DocumentCode
    3445985
  • Title

    Keyword spotting in degraded document using mixed OCR and word shape coding

  • Author

    Xia, Yong ; Quan, Guangri ; Xu, Yongdong ; Sun, Yushan

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • Volume
    3
  • fYear
    2010
  • fDate
    29-31 Oct. 2010
  • Firstpage
    411
  • Lastpage
    414
  • Abstract
    This paper presents a new way for keyword spotting in degraded imaged document. Two prevalent word indexing, OCR and word shape coding, are combined compactly based on the recognition confidence evaluation. The basic procedures are as follows. First, OCR candidates are used for OCR indexing. Second, a new stoke feature and convex-concave feature of word are adopted for word shape coding. Furthermore, an intelligent indexing based on recognition confidence is introduced, which is adaptive to image quality. Finally, an inexact matching is used for word spotting. A collection from NLM, including 1553 scanned imaged documents, is used to evaluate our method. The results confirm the validity of our method.
  • Keywords
    document image processing; feature extraction; image matching; indexing; information retrieval; optical character recognition; word processing; OCR; convex concave feature; image quality; imaged document; keyword spotting; optical character recognition; stoke feature; word indexing; word shape coding; Barium; Biomedical imaging; Measurement; Optical character recognition software; OCR indexing; degraded imaged document; keyword spotting; word shape coding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computing and Intelligent Systems (ICIS), 2010 IEEE International Conference on
  • Conference_Location
    Xiamen
  • Print_ISBN
    978-1-4244-6582-8
  • Type

    conf

  • DOI
    10.1109/ICICISYS.2010.5658616
  • Filename
    5658616