• DocumentCode
    3077283
  • Title

    Turkish spelling error detection and correction by using word n-grams

  • Author

    Dalkiliç, Gökhan ; Çebi, Yalçin

  • Author_Institution
    Comput. Eng. Dept., Dokuz Eylul Univ., Izmir, Turkey
  • fYear
    2009
  • fDate
    2-4 Sept. 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    N-grams can be used for spelling check and correction processes. The first step to use n-grams is to find the language specific n-grams by using a corpus. But a corpus cannot be big enough to contain all the possible word n-grams. Back-off smoothing technique is one of the techniques to estimate the frequency of the unknown n-grams in a corpus. By using Back-off technique and the Minimum Edit Distance (MED) algorithm, a program was developed to check spelling errors and suggest corrections in a sentence typed in Turkish. The results were compared with the results of Microsoft Word 2003 proofing tools, and found to be much better.
  • Keywords
    C language; probability; text analysis; Turkish language; back-off smoothing technique; minimum edit distance algorithm; spelling error correction; spelling error detection; word n-grams; Bayesian methods; Computer errors; Computer languages; Error correction; Frequency estimation; Maximum likelihood estimation; Probability; Smoothing methods; Sparse matrices;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009. ICSCCW 2009. Fifth International Conference on
  • Conference_Location
    Famagusta
  • Print_ISBN
    978-1-4244-3429-9
  • Electronic_ISBN
    978-1-4244-3428-2
  • Type

    conf

  • DOI
    10.1109/ICSCCW.2009.5379481
  • Filename
    5379481