• DocumentCode
    312499
  • Title

    A speaker adaptive Chinese syllable recognition system based on discriminative training

  • Author

    Zhou, Liang ; Imai, Satoshi

  • Author_Institution
    Precision & Intelligence Lab., Tokyo Inst. of Technol., Yokohama, Japan
  • Volume
    1
  • fYear
    1996
  • fDate
    26-29 Nov 1996
  • Firstpage
    31
  • Abstract
    We present two speaker adaptation methods to implement a MSVQ-based adaptive Chinese syllable recognition system. The first proposed method is feature normalization in which we model the inter-speaker variability as a linear transformation. By applying the feature normalization, the target speaker speech is normalized to reduce the inter-speaker acoustic variability. In the second adaptation method, we first present an implementation of the MCE/GPD algorithm for discriminatively training MSVQ-based speech recognizer. It is expected that this method can separate the confusion classes and can enhance speaker adaptation capability. We carried out recognition experiments to assess the performance by using a standard Chinese syllable database CRDB in China, the results show that when both adaptation methods are combined, the error rate reduction on open data is over 62% with a single set of adaptation training data. When increasing the training data, the capability of speaker adaptation is improved using the MCE/GPD training only. After using 5 sets of training data, the average recognition rate for two new speakers was improved from 72.87% to 97.31% which is the best performance reported in this database
  • Keywords
    adaptive signal processing; feature extraction; natural languages; speech coding; speech processing; speech recognition; vector quantisation; CRDB; China; Chinese syllable database; MCE/GPD algorithm; MSVQ based speech recognizer; adaptation training data; average recognition rate; confusion classes; discriminative training; error rate reduction; feature normalization; interspeaker acoustic variability; linear transformation; minimum classification error; performance; recognition experiments; speaker adaptation methods; speaker adaptive Chinese syllable recognition system; Cepstral analysis; Error analysis; Hidden Markov models; Laboratories; Loudspeakers; Natural languages; Spatial databases; Speech recognition; Vectors; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON '96. Proceedings., 1996 IEEE TENCON. Digital Signal Processing Applications
  • Conference_Location
    Perth, WA
  • Print_ISBN
    0-7803-3679-8
  • Type

    conf

  • DOI
    10.1109/TENCON.1996.608695
  • Filename
    608695