• DocumentCode
    284609
  • Title

    Speaker normalization for speech recognition

  • Author

    Huang, Xuedong

  • Author_Institution
    Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • Volume
    1
  • fYear
    1992
  • fDate
    23-26 Mar 1992
  • Firstpage
    465
  • Abstract
    A codeword-dependent neural network (CDNN) is presented for the study of speaker adaptation. The CDNN is used as a nonlinear mapping function to transform speech data between two speakers. The mapping function is characterized by a number of important properties. First, the assembly of mapping functions enhances overall mapping quality. Second, multiple input vectors are used simultaneously in the transformation. This not only makes full use of dynamic information but also alleviates possible errors in the supervision data. Finally, the mapping function is derived from training data, with the quality dependent on the available amount of training data. Based on speaker-dependent models, performance evaluation showed that speaker normalization significantly reduced the error rate from 41.9% to 5.0%
  • Keywords
    learning (artificial intelligence); neural nets; speech recognition; codeword-dependent neural network; dynamic information; multiple input vectors; nonlinear mapping function; overall mapping quality; performance evaluation; speaker adaptation; speaker normalization; speech data transformation; supervision data; training data; Assembly; Computer science; Databases; Error analysis; Loudspeakers; Management training; Neural networks; Speech recognition; Speech synthesis; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-0532-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.1992.225871
  • Filename
    225871