DocumentCode :
183321
Title :
Open-Lexicon Language Modeling Combining Word and Character Levels
Author :
Kozielski, Michal ; Matysiak, Martin ; Doetsch, Patrick ; Schloter, Ralf ; Ney, Hermann
Author_Institution :
Human Language Technol. & Pattern Recognition Group, RWTH Aachen Univ., Aachen, Germany
fYear :
2014
fDate :
1-4 Sept. 2014
Firstpage :
343
Lastpage :
348
Abstract :
In this paper we investigate different n-gram language models that are defined over an open lexicon. We introduce a character-level language model and combine it with a standard word-level language model in a back off fashion. The character-level language model is redefined and renormalized to assign zero probability to words from a fixed vocabulary. Furthermore we present a way to interpolate language models created at the word and character levels. The computation of character-level probabilities incorporates the across-word context. We compare perplexities on all words from the test set and on in-lexicon and OOV words separately on corpora of English and Arabic text.
Keywords :
natural language processing; probability; Arabic text; English text; OOV words; across-word context; backoff fashion; character-level language model; character-level probabilities; language models; n-gram language models; open-lexicon language modeling; word-level language model; zero probability; Computational modeling; Context; Interpolation; Speech recognition; Standards; Training; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location :
Heraklion
ISSN :
2167-6445
Print_ISBN :
978-1-4799-4335-7
Type :
conf
DOI :
10.1109/ICFHR.2014.64
Filename :
6981043
Link To Document :
بازگشت