• DocumentCode
    2216361
  • Title

    Recognizing broken characters in Thai Historical documents

  • Author

    Sumetphong, Chaivatna ; Tangwongsan, Supachai

  • Author_Institution
    Fac. of Inf. & Commun. Technol., Mahidol Univ., Bangkok, Thailand
  • Volume
    1
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Abstract
    One of the biggest challenges in restoring historical documents is to achieve a high level of OCR accuracy. The main characteristic inherent to these valuable but degraded documents is the abundant presence of broken characters. This paper represents this problem as a mathematical model. We also propose a novel solution based on set-partitions to recognize broken characters in Thai Historical documents. Experiments based on this solution have been performed and the results are very promising.
  • Keywords
    character recognition; document image processing; image restoration; mathematical analysis; natural language processing; OCR accuracy; Thai historical document; broken character recognition; degraded document; historical document restoration; mathematical model; set-partition; Character recognition; Broken Characters; Error Correction; Optical Character Recognition; Set-Partitions; Thai Historical Documents;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference on
  • Conference_Location
    Chengdu
  • ISSN
    2154-7491
  • Print_ISBN
    978-1-4244-6539-2
  • Type

    conf

  • DOI
    10.1109/ICACTE.2010.5579053
  • Filename
    5579053