Title :
Memory-based context-sensitive spelling correction at web scale
Author :
Carlson, Andrew ; Fette, Ian
Author_Institution :
Carnegie Mellon Univ., Pittsburgh
Abstract :
We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training data. Our approach uses the context in which an error appears to select the most likely candidate from words which might have been intended in its place. Using a novel correction algorithm and a massive database of training data, we demonstrate higher accuracy on correcting real- word errors than previous work, and very high accuracy at a new task of ranking corrections to non-word errors given by a standard spelling correction package.
Keywords :
learning (artificial intelligence); natural language processing; spelling aids; text analysis; very large databases; Web text; context-sensitive spelling correction; memory-based learning techniques; spelling mistakes; token n-gram occurrences; very large database; Application software; Computer science; Databases; Dictionaries; Error correction; Machine learning; Packaging; Statistics; Testing; Training data;
Conference_Titel :
Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
Conference_Location :
Cincinnati, OH
Print_ISBN :
978-0-7695-3069-7
DOI :
10.1109/ICMLA.2007.50