• DocumentCode
    1010080
  • Title

    Language model and speaking rate adaptation for spontaneous presentation speech recognition

  • Author

    Nanjo, Hiroaki ; Kawahara, Tatsuya

  • Author_Institution
    Graduate Sch. of Informatics, Kyoto Univ., Japan
  • Volume
    12
  • Issue
    4
  • fYear
    2004
  • fDate
    7/1/2004 12:00:00 AM
  • Firstpage
    391
  • Lastpage
    400
  • Abstract
    The paper addresses adaptation methods to language model and speaking rate (SR) of individual speakers which are two major problems in automatic transcription of spontaneous presentation speech. To cope with a large variation in expression and pronunciation of words depending on the speaker, firstly, we investigate the effect of statistical and context-dependent pronunciation modeling. Secondly, we present unsupervised methods of language model adaptation to a specific speaker and a topic by 1) selecting similar texts based on the word perplexity and TF-IDF measure and 2) making direct use of the initial recognition result for generating an enhanced model. We confirm that all proposed adaptation methods and their combinations reduce the perplexity and word error rate. We also present a decoding strategy adapted to the SR. In spontaneous speech, SR is generally fast and may vary a lot. We also observe different error tendencies for portions of presentations where speech is fast or slow. Therefore, we propose a SR-dependent decoding strategy that applies the most appropriate acoustic analysis, phone models, and decoding parameters according to the SR. Several methods are investigated and their selective application leads to improved accuracy. The combined effect of the two proposed adaptation methods is also confirmed in transcription of real academic presentation.
  • Keywords
    acoustic signal processing; error statistics; speaker recognition; statistical analysis; TF-IDF measure; acoustic analysis; context-dependent pronunciation modeling; decoding parameters; language model adaptation; phone models; speaking rate adaptation; speaking rate dependent decoding strategy; speech automatic transcription; spontaneous presentation speech recognition; statistical pronunciation modeling; word error rate; word perplexity; Adaptation model; Automatic speech recognition; Context modeling; Decoding; Large-scale systems; Natural languages; Speech analysis; Speech recognition; Strontium; Text recognition;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2004.828641
  • Filename
    1306512