DocumentCode :
3585077
Title :
A word-level token-passing decoder for subword n-gram LVCSR
Author :
Varjokallio, Matti ; Kurimo, Mikko
Author_Institution :
Dept. of Signal Process. & Acoust., Aalto Univ., Espoo, Finland
fYear :
2014
Firstpage :
495
Lastpage :
500
Abstract :
The decoder is a key component of any modern speech recognizer. Morphologically rich languages pose special challenges for the decoder design, as a very large recognition vocabulary is required to avoid high out-of-vocabulary (OOV) rates. To alleviate these issues, the n-gram models are often trained over subwords instead of words. A subword n-gram model is able to assign probabilities to unseen word forms. We review token-passing decoding and suggest a novel way of creating the decoding graph for subword n-grams on word-level. This approach has the advantage of a better control over the recognition vocabulary, including removal of nonsense words and the possibility to include important OOV-words to the graph. The different decoders are evaluated in a Finnish large vocabulary continuous speech recognition (LVCSR) task.
Keywords :
natural language processing; protocols; speech coding; speech recognition; vocabulary; word processing; Finnish large vocabulary continuous speech recognition task; OOV rates; OOV-words; out-of-vocabulary rates; recognition vocabulary; speech recognizer; subword n-gram LVCSR; subword n-gram model; word-level token-passing decoder; Abstracts; Hidden Markov models; Decoding; Large Vocabulary Continuous Speech Recognition; Subword n-grams; Token-passing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
Type :
conf
DOI :
10.1109/SLT.2014.7078624
Filename :
7078624
Link To Document :
بازگشت