كليدواژه :
بازشناسي گفتار گسسته ي با دايره ي لغات وسيع , مدلسازي نيمه پيوسته ماركف مخفي , بازشناسي گفتار باوابسته به گوينده با دايره لغات وسيع
چكيده لاتين :
During the process of design and implementation of classic real-time speaker independent discrete utterance speech recognition systems with large vocabulary (1000 to 10000 words), one encounter two major problems: First, the time consuming process of preparing a large vocabulary data set with a considerable number (100 to 10000) of speakers for obtaining a satisfactory and reliable training of the system, and, second, impossibility of a real-time execution of recognition phase using available personal computers. In order to solve these problems, we have done a detailed and vast research. Regarding the first problem, we have prepared a large speech data set (50 to 60 pronunciations/word for each speaker) using 50 to 100 speakers chosen based on a special methodology (number of males is 1.5 times the number of females), then, we have designed a speaker dependent speech recognition system for each speaker, and by a special combination of reference speakers, we have achieved a speaker independent speech recognition system with an recognition rate of 97.4% with a standard deviation of 2.1%. However, due to the high computational cost of ML (Maximum Likelihood) training method, real-time implementation of recognition phase is impossible. In order to solve this problem, we have used several Tied Mixtures methods to represent the pdf (probability density function) of HMM states. Finally, using Tied Mixtures methods, SCD (Semi Continuous Density) modeling and fast search algorithms in SCD code book, we could reach a real-time implementation of our system during the recognition phase. Due to the utilization of sub-optimal methods, the speech recognition performance of the resulted system has a reduction of 1.5% comparing the previous results. As a consequence, we have achieved a speaker independent speech recognition system with a recognition rate of 95.9% with a standard deviation of 2.8%. In speaker dependent mode, the recognition rate is 98.5% with a standard deviation of 1.2%. This system works in real-time mode tested on a Pentium IV PC with a speed higher than 2.4 GHz and 512 MB of RAM.