• DocumentCode
    2445978
  • Title

    Improving the Robustness of Persian Large Vocabulary Continuous Speech Recognition System for Real Applications

  • Author

    Veisi, H. ; Sameti, H. ; Babaali, B. ; Hosseinzadeh, Kh. ; Manzuri, M.T.

  • Author_Institution
    Dept. of Comput. Eng., Sharif Univ. of Technol., Tehran
  • Volume
    1
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    1293
  • Lastpage
    1297
  • Abstract
    In this paper vocal track length normalization (VTLN) with adaptation methods, MLLR and MAP were investigated to making robust Persian HMM-based speaker independent large vocabulary continuous speech recognition system. The robustness for speaker and environmental noises were achieved in real world applications in this system. In VTLN method, a line-search based approach was used in order to find speakers relative warping factors. The factors were applied to signal´s spectrum to normalize the variations in vocal track length between speakers. In the MLLR method, Gaussian mean and variance transformations in full adaptation were experienced. In this method regression tree-based adaptation in supervised fashion was used. Also the standard MAP was experienced as an adaptation method for compensate speaker and environment variations. Combinations of these approaches were evaluated on 4 different noisy tasks. We could achieve the significant improvement in the recognition performance in noisy environments as it makes our system operational in real applications
  • Keywords
    Gaussian processes; hidden Markov models; maximum likelihood estimation; natural languages; regression analysis; speech recognition; trees (mathematics); vocabulary; Gaussian mean transformation; Guassian variance transformation; Persian hidden Markov model; Persian large vocabulary continuous speech recognition system; adaptation methods; line-search based approach; maximum a posteriori; maximum likelihood linear regression; regression tree-based adaptation; signals spectrum; speakers relative warping factors; vocal track length normalization; Acoustic noise; Degradation; Frequency estimation; Loudspeakers; Maximum likelihood linear regression; Noise robustness; Regression tree analysis; Speech recognition; Vocabulary; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technologies, 2006. ICTTA '06. 2nd
  • Conference_Location
    Damascus
  • Print_ISBN
    0-7803-9521-2
  • Type

    conf

  • DOI
    10.1109/ICTTA.2006.1684565
  • Filename
    1684565