Title :
A hybrid statistical model to generate pronunciation variants of words
Author :
Vazirnezhad, Bahram ; Almasganj, Farshad ; Bijankhan, Mahmood
Author_Institution :
Biomed. Eng. Fac., Amirkabir Univ. of Technol., Tehran, Iran
fDate :
30 Oct.-1 Nov. 2005
Abstract :
Generating pronunciation variants of words is an important applicable subject in speech researches and is used extensively in automatic speech segmentation and recognition systems. In this way, decision trees are extremely used to model pronunciation variants of words and sub-word unites. In the case of word unites and very large vocabulary, to train necessary decision trees we need a huge amount of speech utterances which contains all of the needed words with a sufficient number of each one. This approach besides demanding very large data, for new words needs some new extra corpus. To solve these problems we have used generalized decision trees, that each tree is trained for a group of words with similar phonemic structure instead of a single word. These trees can predict regions of the words in which substitution, deletion and insertion of phonemes would occur. Next to this step, appropriate statistical contextual rules, which are extracted from a large speech corpus, is applied to these regions in order to generate words variants. This new hybrid d-tree/c-rule approach takes into account word phonological structures, stress, and phone context information simultaneously and an ordinary size speech corpus is sufficient to train its models. By using the word variants obtained by this method in the lexicon of "SHENAVA", a Persian ACSR, a relative WER% reduction of as high as 6% was obtained.
Keywords :
decision trees; speech processing; speech recognition; statistical analysis; SHENAVA; automatic speech segmentation; c-rule; generalized decision trees; hybrid d-tree; hybrid statistical model; phonemic structure; speech corpus; speech recognition systems; statistical contextual rules; word pronunciation variants; Automatic speech recognition; Biomedical engineering; Biomedical signal processing; Context modeling; Data mining; Decision trees; Hybrid power systems; Speech processing; Stress; Vocabulary;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598716