مرکز منطقه ای اطلاع رساني علوم و فناوري - A word-level token-passing decoder for subword n-gram LVCSR

DocumentCode :

3585077

Title :

A word-level token-passing decoder for subword n-gram LVCSR

Author :

Varjokallio, Matti ; Kurimo, Mikko

Author_Institution :

Dept. of Signal Process. & Acoust., Aalto Univ., Espoo, Finland

fYear :

2014

Firstpage :

495

Lastpage :

500

Abstract :

The decoder is a key component of any modern speech recognizer. Morphologically rich languages pose special challenges for the decoder design, as a very large recognition vocabulary is required to avoid high out-of-vocabulary (OOV) rates. To alleviate these issues, the n-gram models are often trained over subwords instead of words. A subword n-gram model is able to assign probabilities to unseen word forms. We review token-passing decoding and suggest a novel way of creating the decoding graph for subword n-grams on word-level. This approach has the advantage of a better control over the recognition vocabulary, including removal of nonsense words and the possibility to include important OOV-words to the graph. The different decoders are evaluated in a Finnish large vocabulary continuous speech recognition (LVCSR) task.

Keywords :

natural language processing; protocols; speech coding; speech recognition; vocabulary; word processing; Finnish large vocabulary continuous speech recognition task; OOV rates; OOV-words; out-of-vocabulary rates; recognition vocabulary; speech recognizer; subword n-gram LVCSR; subword n-gram model; word-level token-passing decoder; Abstracts; Hidden Markov models; Decoding; Large Vocabulary Continuous Speech Recognition; Subword n-grams; Token-passing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language Technology Workshop (SLT), 2014 IEEE

Type :

conf

DOI :

10.1109/SLT.2014.7078624

Filename :

7078624

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3585077