DocumentCode
3380978
Title
Spell Checker for Thai Document
Author
Watcharabutsarakham, Sarin
Author_Institution
Nat. Electron. & Comput. Technol. Center, Pathumthani
fYear
2005
fDate
21-24 Nov. 2005
Firstpage
1
Lastpage
4
Abstract
The objective of post-processing of OCR is to correct error from OCR result. It is important to use a spell checker tool to detect and to correct misspelled words. This paper proposes statistical method to find unexpectedly frequent character sequences without relying on the dictionary. It is a flexible method to detect the out of vocabulary words. The corpus that used to create 3-grams is belongs to NECTEC (National Electronic and Computer Technology Center). The result is 3-grams are selected to use as the spelling checker for Thai documents. The ArnThai software is OCR software, which used to evaluate the proposed technique.
Keywords
dictionaries; natural language processing; optical character recognition; statistical analysis; text analysis; ArnThai software; National Electronic and Computer Technology Center; OCR software; Thai document spell checker; dictionary; statistical method; unexpectedly frequent character sequences; vocabulary words; Decision support systems; Virtual reality; OCR; Spell checker; n-grams;
fLanguage
English
Publisher
ieee
Conference_Titel
TENCON 2005 2005 IEEE Region 10
Conference_Location
Melbourne, Qld.
Print_ISBN
0-7803-9311-2
Electronic_ISBN
0-7803-9312-0
Type
conf
DOI
10.1109/TENCON.2005.301330
Filename
4085130
Link To Document