DocumentCode
1208821
Title
Importance of High-Order N-Gram Models in Morph-Based Speech Recognition
Author
Hirsimäki, Teemu ; Pylkkönen, Janne ; Kurimo, Mikko
Author_Institution
Adaptive Inf. Res. Center, Helsinki Univ. of Technol., Espoo
Volume
17
Issue
4
fYear
2009
fDate
5/1/2009 12:00:00 AM
Firstpage
724
Lastpage
732
Abstract
Speech recognition systems trained for morphologically rich languages face the problem of vocabulary growth caused by prefixes, suffixes, inflections, and compound words. Solutions proposed in the literature include increasing the size of the vocabulary and segmenting words into morphs. However, in many cases, the methods have only been experimented with low-order n-gram models or compared to word-based models that do not have very large vocabularies. In this paper, we study the importance of using high-order variable-length n-gram models when the language models are trained over morphs instead of whole words. Language models trained on a very large vocabulary are compared with models based on different morph segmentations. Speech recognition experiments are carried out on two highly inflecting and agglutinative languages, Finnish and Estonian. The results suggest that high-order models can be essential in morph-based speech recognition, even when lattices are generated for two-pass recognition. The analysis of recognition errors reveal that the high-order morph language models improve especially the recognition of previously unseen words.
Keywords
natural language processing; speech recognition; Estonian language; Finnish language; high-order morph language model; high-order n-gram model; morph-based speech recognition; variable-length n-gram model; Decoding; Error analysis; Face recognition; Informatics; Lattices; Learning systems; Morphology; Natural languages; Speech recognition; Vocabulary; Language modeling (LM); morphology; speech recognition; variable-length n-grams;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2008.2012323
Filename
4806279
Link To Document