Title :
Over-Generative Finite State Transducer N-Gram for Out-of-Vocabulary Word Recognition
Author :
Messina, Ronaldo ; Kermorvant, Christopher
Author_Institution :
A2iA S.A., Paris, France
Abstract :
Hybrid statistical grammars both at word and character levels can be used to perform open-vocabulary recognition. This is usually done by allowing the special symbol for unknown-word in the word-level grammar and dynamically replacing it by a (long) n-gramat character-level, as the full transducer does not fit in the memory of most current computers. We present a modification of a finite-state-transducer (fst) n-gram that enables the creation of a static transducer, i.e. when it is not possible to perform on-demand composition. By combining paths in the "LG" transducer (composition of lexicon and n-gram)making it over-generative with respect to the n-grams observed in the corpus, it is possible to reduce the number of actual occurrences of the character-level grammar, the resulting transducer fits the memory of practical machines. We evaluate this model for handwriting recognition using the RIMES and the IAM dabases. We study its effect on the vocabulary size and show that this model is competitive with state-of-the-art solutions.
Keywords :
grammars; handwriting recognition; natural language processing; IAM; LG transducer; RIMES; character-level grammar; handwriting recognition; hybrid statistical grammars; n-gram; open-vocabulary recognition; out-of-vocabulary word recognition; over-generative finite state transducer; unknown-word; word-level grammar; Character recognition; Databases; Decoding; Grammar; Handwriting recognition; Transducers; Vocabulary; Finite state transducer; Handwritten recognition; Out of Vocabulary modeling;
Conference_Titel :
Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
Conference_Location :
Tours
Print_ISBN :
978-1-4799-3243-6
DOI :
10.1109/DAS.2014.24