• DocumentCode
    3380978
  • Title

    Spell Checker for Thai Document

  • Author

    Watcharabutsarakham, Sarin

  • Author_Institution
    Nat. Electron. & Comput. Technol. Center, Pathumthani
  • fYear
    2005
  • fDate
    21-24 Nov. 2005
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The objective of post-processing of OCR is to correct error from OCR result. It is important to use a spell checker tool to detect and to correct misspelled words. This paper proposes statistical method to find unexpectedly frequent character sequences without relying on the dictionary. It is a flexible method to detect the out of vocabulary words. The corpus that used to create 3-grams is belongs to NECTEC (National Electronic and Computer Technology Center). The result is 3-grams are selected to use as the spelling checker for Thai documents. The ArnThai software is OCR software, which used to evaluate the proposed technique.
  • Keywords
    dictionaries; natural language processing; optical character recognition; statistical analysis; text analysis; ArnThai software; National Electronic and Computer Technology Center; OCR software; Thai document spell checker; dictionary; statistical method; unexpectedly frequent character sequences; vocabulary words; Decision support systems; Virtual reality; OCR; Spell checker; n-grams;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2005 2005 IEEE Region 10
  • Conference_Location
    Melbourne, Qld.
  • Print_ISBN
    0-7803-9311-2
  • Electronic_ISBN
    0-7803-9312-0
  • Type

    conf

  • DOI
    10.1109/TENCON.2005.301330
  • Filename
    4085130