DocumentCode :
3380978
Title :
Spell Checker for Thai Document
Author :
Watcharabutsarakham, Sarin
Author_Institution :
Nat. Electron. & Comput. Technol. Center, Pathumthani
fYear :
2005
fDate :
21-24 Nov. 2005
Firstpage :
1
Lastpage :
4
Abstract :
The objective of post-processing of OCR is to correct error from OCR result. It is important to use a spell checker tool to detect and to correct misspelled words. This paper proposes statistical method to find unexpectedly frequent character sequences without relying on the dictionary. It is a flexible method to detect the out of vocabulary words. The corpus that used to create 3-grams is belongs to NECTEC (National Electronic and Computer Technology Center). The result is 3-grams are selected to use as the spelling checker for Thai documents. The ArnThai software is OCR software, which used to evaluate the proposed technique.
Keywords :
dictionaries; natural language processing; optical character recognition; statistical analysis; text analysis; ArnThai software; National Electronic and Computer Technology Center; OCR software; Thai document spell checker; dictionary; statistical method; unexpectedly frequent character sequences; vocabulary words; Decision support systems; Virtual reality; OCR; Spell checker; n-grams;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2005 2005 IEEE Region 10
Conference_Location :
Melbourne, Qld.
Print_ISBN :
0-7803-9311-2
Electronic_ISBN :
0-7803-9312-0
Type :
conf
DOI :
10.1109/TENCON.2005.301330
Filename :
4085130
Link To Document :
بازگشت