Title :
Using Morphological Information for Robust Language Modeling in Czech ASR System
Author :
Ircing, Pavel ; Psutka, Josef V. ; Psutka, Josef
Author_Institution :
Dept. of Cybern., Univ. of West Bohemia, Plzen
fDate :
5/1/2009 12:00:00 AM
Abstract :
Automatic speech recognition, or more precisely language modeling, of the Czech language has to face challenges that are not present in the language modeling of English. Those include mainly the rapid vocabulary growth and closely connected unreliable estimates of the language model parameters. These phenomena are caused mostly by the highly inflectional nature of the Czech language. On the other hand, the rich morphology together with the well-developed automatic systems for morphological tagging can be exploited to reinforce the language model probability estimates. This paper shows that using rich morphological tags within the concept of class-based n-gram language model with many-to-many word-to-class mapping and combination of this model with the standard word-based n-gram can improve the recognition accuracy over the word-based baseline on the task of automatic transcription of unconstrained spontaneous Czech interviews.
Keywords :
estimation theory; natural language processing; probability; speech recognition; Czech automatic speech recognition system; automatic transcription; class-based n-gram language model; language model probability estimation; many-to-many word-to-class mapping; morphological tagging; morphological tags; robust language modeling; vocabulary growth; Automatic speech recognition; Availability; Morphology; Natural language processing; Natural languages; Robustness; Speech recognition; Speech synthesis; Tagging; Vocabulary; Language models; speech recognition and synthesis;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2014217