• DocumentCode
    2541833
  • Title

    Chinese document image retrieval based on recognition candidates

  • Author

    Jia, Xuhui ; Xia, Yong ; Zhou, Rui ; Liang, Hongwei

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2012
  • fDate
    29-31 May 2012
  • Firstpage
    2892
  • Lastpage
    2897
  • Abstract
    For the sake of the low recognition rate for degraded Chinese document, the retrieval performance is not good if directly based on OCR result. In this paper, an indexing method with n-gram and recognition candidates is proposed to improve the performance of retrieval. For ease of test, this paper also presents a method to automatically generate ground-truth of imaged document, synthesized degraded document image and ground-truth of recognition candidates. Several synthesized document image collections on large-scale are built and used, and the experimental results show that the retrieval performance are improved for both collections with high or low OCR error rates.
  • Keywords
    document image processing; image retrieval; indexing; natural languages; optical character recognition; OCR error rates; degraded Chinese document image retrieval; imaged document ground-truth automatic generation; indexing method; n-gram; optical character recognition; recognition candidate ground-truth automatic generation; recognition rate; retrieval performance improvement; synthesized degraded document image; Character recognition; Degradation; Estimation; Image recognition; Image retrieval; Indexing; Optical character recognition software; Chinese document image retrieval; indexing method with n-gram and recognition candidates; synthesized degraded document image;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
  • Conference_Location
    Sichuan
  • Print_ISBN
    978-1-4673-0025-4
  • Type

    conf

  • DOI
    10.1109/FSKD.2012.6233763
  • Filename
    6233763