Improving the Robustness of Persian Large Vocabulary Continuous Speech Recognition System for Real Applications

Author

Veisi, H. ; Sameti, H. ; Babaali, B. ; Hosseinzadeh, Kh. ; Manzuri, M.T.

Author_Institution

Dept. of Comput. Eng., Sharif Univ. of Technol., Tehran

Volume

1

fYear

0

fDate

0-0 0

Firstpage

1293

Lastpage

1297

Abstract

In this paper vocal track length normalization (VTLN) with adaptation methods, MLLR and MAP were investigated to making robust Persian HMM-based speaker independent large vocabulary continuous speech recognition system. The robustness for speaker and environmental noises were achieved in real world applications in this system. In VTLN method, a line-search based approach was used in order to find speakers relative warping factors. The factors were applied to signal´s spectrum to normalize the variations in vocal track length between speakers. In the MLLR method, Gaussian mean and variance transformations in full adaptation were experienced. In this method regression tree-based adaptation in supervised fashion was used. Also the standard MAP was experienced as an adaptation method for compensate speaker and environment variations. Combinations of these approaches were evaluated on 4 different noisy tasks. We could achieve the significant improvement in the recognition performance in noisy environments as it makes our system operational in real applications

Keywords

Gaussian processes; hidden Markov models; maximum likelihood estimation; natural languages; regression analysis; speech recognition; trees (mathematics); vocabulary; Gaussian mean transformation; Guassian variance transformation; Persian hidden Markov model; Persian large vocabulary continuous speech recognition system; adaptation methods; line-search based approach; maximum a posteriori; maximum likelihood linear regression; regression tree-based adaptation; signals spectrum; speakers relative warping factors; vocal track length normalization; Acoustic noise; Degradation; Frequency estimation; Loudspeakers; Maximum likelihood linear regression; Noise robustness; Regression tree analysis; Speech recognition; Vocabulary; Working environment noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Information and Communication Technologies, 2006. ICTTA '06. 2nd

Conference_Location

Damascus

Print_ISBN

0-7803-9521-2

Type

conf

DOI

10.1109/ICTTA.2006.1684565

Filename

1684565