• DocumentCode
    2179153
  • Title

    Recent development of discriminative training using non-uniform criteria for cross-level acoustic modeling

  • Author

    Weng, Chao ; Juang, Biing-Hwang

  • Author_Institution
    Center for Signal & Image Process., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5332
  • Lastpage
    5335
  • Abstract
    In this paper, we extend our previous study on discriminative training using non-uniform criteria for speech recognition. The work will put emphasis on how the acoustic modeling interacts with the risk at a higher level, which is more relevant to the most used evaluation measures, e.g., word error rate (WER). To be specific, the non-uniform error cost is first derived at the word level to minimize the risk w.r.t. WER and then computed on the word lattice using the forward-backward algorithm. With the statistics obtained from the forward-backward algorithm, the competing hypotheses for each label word are searched by performing dynamic programming between the label word sequence and the word lattice at the phone level. In order to alleviate the level inconsistency between the acoustic model (phone level) and the evaluation measure (word level), the derived error cost is embedded into the overall objective function in a cross-level fashion. Experiments on a large vocabulary task WSJO demonstrate the effectiveness of the overall approach, which show it outperforms two prevalent discriminative training methods and achieves about 13% relative improvement over the baseline system.
  • Keywords
    dynamic programming; speech recognition; WER; cross-level acoustic modeling; discriminative training; dynamic programming; forward-backward algorithm; nonuniform criteria; prevalent discriminative training methods; speech recognition; word error rate; Acoustics; Dynamic programming; Hidden Markov models; Lattices; Measurement uncertainty; Speech recognition; Training; discriminative training; non-uniform error cost; word lattice;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947562
  • Filename
    5947562