Title :
Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks
Author :
Mihov, Stoyan ; Mitankin, Petar ; Schulz, Klaus U.
Author_Institution :
Bulgarian Acad. of Sci., Plovdiv
Abstract :
Lexical text correction relies on a central step where approximate search in a dictionary is used to select the best correction suggestions for an ill-formed input token. In previous work we introduced the concept of a universal Levenshtein automaton and showed how to use these automata for efficiently selecting from a dictionary all entries within a fixed Levenshtein distance to the garbled input word. In this paper we look at refinements of the basic Levenshtein distance that yield more sensible notions of similarity in distinct text correction applications, e.g. OCR. We show that the concept of a universal Levenshtein automaton can be adapted to these refinements. In this way we obtain a method for selecting correction candidates which is very efficient, at the same time selecting small candidate sets with high recall.
Keywords :
dictionaries; text analysis; Levenshtein distance; Lexical text correction task; dictionaries; ill-formed input token; universal Levenshtein automaton; Automata; Automatic control; Computational Intelligence Society; Dictionaries; Error correction; Frequency; Keyboards; Optical character recognition software; Testing; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
Print_ISBN :
978-0-7695-2822-9
DOI :
10.1109/ICDAR.2007.4378754