مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning a subword vocabulary based on unigram likelihood

DocumentCode :

672320

Title :

Learning a subword vocabulary based on unigram likelihood

Author :

Varjokallio, Matti ; Kurimo, Mikko ; Virpioja, Sami

Author_Institution :

Dept. of Signal Process. & Acoust., Aalto Univ., Espoo, Finland

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

Lastpage :

Abstract :

Using words as vocabulary units for tasks like speech recognition is infeasible for many morphologically rich languages, including Finnish. Thus, subword units are commonly used for language modeling. This work presents a novel algorithm for creating a subword vocabulary, based on the unigram likelihood of a text corpus. The method is evaluated with entropy measure and a Finnish LVCSR task. Unigram entropy of the text corpus is shown to be a good indicator for the quality of higher order n-gram models, also resulting in high speech recognition accuracy.

Keywords :

natural language processing; speech recognition; text analysis; vocabulary; Finnish; speech recognition; subword vocabulary; text corpus; unigram likelihood; vocabulary units; Entropy; Hidden Markov models; Speech; Speech recognition; Training; Viterbi algorithm; Vocabulary; Large Vocabulary Continuous Speech Recognition; Subword Modeling; Vocabulary Selection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707697

Filename :

6707697

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672320