• DocumentCode
    2398505
  • Title

    Retrieval of degraded Chinese document based on fuzzy coding strategy

  • Author

    Xia Yong ; Jia Xu-Hui ; Wang Kuan-Quan

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2012
  • fDate
    19-20 May 2012
  • Firstpage
    261
  • Lastpage
    264
  • Abstract
    For the sake of the low recognition rate for degraded Chinese document, the performance of retrieval is not good if directly based on OCR result. This paper presents a new way to improve the performance of retrieval by fuzzy coding strategy. Lots of character classes with similar shapes are clustered and are indexed by pseudo code. For ease of test, this paper also presents a way to generate ground-truth of imaged document and synthesized degraded document image. A true OCR text collection and two synthesized document image collections are used for performance evaluation, and the result confirms the validation of our method.
  • Keywords
    document image processing; fuzzy set theory; image coding; image retrieval; optical character recognition; OCR text collection; degraded Chinese document retrieval; fuzzy coding strategy; ground-truth generate; imaged document; pseudo code; synthesized degraded document image; Degradation; Encoding; Image retrieval; Indexing; Optical character recognition software; Performance evaluation; Text analysis; Retrieval of degraded Chinese document; Synthesis of degraded document; fuzzy coding strategy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems and Informatics (ICSAI), 2012 International Conference on
  • Conference_Location
    Yantai
  • Print_ISBN
    978-1-4673-0198-5
  • Type

    conf

  • DOI
    10.1109/ICSAI.2012.6223602
  • Filename
    6223602