• DocumentCode
    337476
  • Title

    Improved methods for vocal tract normalization

  • Author

    Welling, L. ; Kanthak, S. ; Ney, H.

  • Author_Institution
    Tech. Hochschule Aachen, Germany
  • Volume
    2
  • fYear
    1999
  • fDate
    15-19 Mar 1999
  • Firstpage
    761
  • Abstract
    This paper presents improved methods for vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new method for VTN in training: by using acoustic models with single Gaussian densities per state for selecting the normalization scales the need for the models to learn the normalization scales of the training speakers is avoided. We show that using single Gaussian densities for selecting the normalization scales in training results in lower error rates than using mixture densities. For VTN in recognition, we propose an improvement of the well-known multiple-pass strategy: by using an unnormalized acoustic model for the first recognition pass instead of a normalized model lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The multiple-pass strategy is an efficient method but it is suboptimal because the normalization scale and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. On the German spontaneous scheduling task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill the proposed methods for VTN reduce the error rates significantly
  • Keywords
    Gaussian processes; speech processing; speech recognition; German spontaneous scheduling task; German telephone digit string corpus; SieTill; Verbmobil; WSJ task; databases; error rates; experimental tests; improved methods; mixture densities; multiple-pass strategy; normalization scales; recognition performance; recognition tests; single Gaussian densities; speech recognition; suboptimal method; telephone digit string recognition; training speakers; unnormalized acoustic model; vocal tract normalization; word sequence; Acoustic testing; Databases; Error analysis; Frequency; Loudspeakers; Performance gain; Piecewise linear techniques; Speech; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-5041-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1999.759780
  • Filename
    759780