DocumentCode
3059529
Title
Memory-based context-sensitive spelling correction at web scale
Author
Carlson, Andrew ; Fette, Ian
Author_Institution
Carnegie Mellon Univ., Pittsburgh
fYear
2007
fDate
13-15 Dec. 2007
Firstpage
166
Lastpage
171
Abstract
We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training data. Our approach uses the context in which an error appears to select the most likely candidate from words which might have been intended in its place. Using a novel correction algorithm and a massive database of training data, we demonstrate higher accuracy on correcting real- word errors than previous work, and very high accuracy at a new task of ranking corrections to non-word errors given by a standard spelling correction package.
Keywords
learning (artificial intelligence); natural language processing; spelling aids; text analysis; very large databases; Web text; context-sensitive spelling correction; memory-based learning techniques; spelling mistakes; token n-gram occurrences; very large database; Application software; Computer science; Databases; Dictionaries; Error correction; Machine learning; Packaging; Statistics; Testing; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
Conference_Location
Cincinnati, OH
Print_ISBN
978-0-7695-3069-7
Type
conf
DOI
10.1109/ICMLA.2007.50
Filename
4457226
Link To Document