DocumentCode
2955645
Title
An Optimal Approach Towards Recognizing Broken Thai Characters in OCR Systems
Author
Sumetphong, C. ; Tangwongsan, Supachai
Author_Institution
Fac. of Inf. & Commun. Technol., Mahidol Univ., Bangkok, Thailand
fYear
2012
fDate
3-5 Dec. 2012
Firstpage
1
Lastpage
5
Abstract
This paper presents a novel technique for recognizing broken Thai characters found in degraded Thai text documents by modeling it as a set-partitioning problem (SPP). The technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed Thai character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm we call Heuristic Incremental Integer Programming (HIIP), that employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. To generate corrected Thai words, we adopt a probabilistic generative approach based a Thai dictionary corpus. The proposed technique is applied successfully to a Thai historical document and poor quality Thai fax document with promising accuracy rates over 93%.
Keywords
convergence; integer programming; optical character recognition; probability; text analysis; HIIP; OCR systems; SPP; Thai dictionary corpus; Thai historical document; Thai text documents; Thai words; broken Thai character recognition; convergence; heuristic incremental integer programming; objective function; probabilistic generative approach; set-partitioning problem; Character recognition; IP networks; Image segmentation; Linear programming; Mathematical model; Optical character recognition software;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Image Computing Techniques and Applications (DICTA), 2012 International Conference on
Conference_Location
Fremantle, WA
Print_ISBN
978-1-4673-2180-8
Electronic_ISBN
978-1-4673-2179-2
Type
conf
DOI
10.1109/DICTA.2012.6411736
Filename
6411736
Link To Document