DocumentCode :
2260794
Title :
Real-word spelling correction using Google Web 1T n-gram with backoff
Author :
Islam, Amunul ; Inkpen, Diana
Author_Institution :
Dept. of Comput. Sci., Univ. of Ottawa, Ottawa, ON, Canada
fYear :
2009
fDate :
24-27 Sept. 2009
Firstpage :
1
Lastpage :
8
Abstract :
We present a method for correcting real-word spelling errors using the Google Web 1T n-gram data set and a normalized and modified version of the longest common subsequence (LCS) string matching algorithm. Our method is focused mainly on how to improve the correction recall (the fraction of errors corrected) while keeping the correction precision (the fraction of suggestions that are correct) as high as possible. Evaluation results on a standard data set show that our method performs very well.
Keywords :
Internet; search engines; spelling aids; string matching; text analysis; Google Web 1T n-gram data set; LCS; correction precision; correction recall; longest common subsequence string matching algorithm; real-word spelling error correction; text analysis; Computer errors; Computer science; Dictionaries; Error correction; Humans; Learning systems; Machine learning; Machine learning algorithms; Performance evaluation; Voting; Google web 1T; Real-word; n-gram; spelling correction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
Type :
conf
DOI :
10.1109/NLPKE.2009.5313823
Filename :
5313823
Link To Document :
بازگشت