• DocumentCode
    990173
  • Title

    Pronunciation Modeling With Reduced Confusion for Mandarin Chinese Using a Three-Stage Framework

  • Author

    Tsai, Ming-Yi ; Chou, Fu-Chiang ; Lee, Lin-shan

  • Author_Institution
    Graduate Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei
  • Volume
    15
  • Issue
    2
  • fYear
    2007
  • Firstpage
    661
  • Lastpage
    675
  • Abstract
    Multiple-pronunciation dictionaries have been found to be useful in pronunciation modeling for speech recognition. However, the extra pronunciation variants added in the dictionary inevitably increase the confusion among different words during recognition, and consequently limit the achievable improvements in the recognition performance. This paper proposes a three-stage framework for Mandarin Chinese to construct automatically the multiple-pronunciation dictionary while reducing the possible confusion caused. The proposed framework includes pronunciation generation (Stage 1), ranking (Stage 2) and pruning (Stage3). New measures of confusability for multiple-pronunciation dictionaries were developed and shown to have a very strong correlation with recognition performance. With the proposed framework, it was shown that the confusability as measured can be reduced and recognition performance improved stage by stage. All of the above findings were verified by a series of experiments performed on both planned (LDC HUB-4NE) and spontaneous (LDC CALLHOME) Mandarin Chinese speech corpora
  • Keywords
    natural languages; speech recognition; Mandarin Chinese; multiple-pronunciation dictionary; pronunciation generation; pronunciation modeling; speech recognition; three-stage framework; Artificial neural networks; Automatic speech recognition; Decision trees; Dictionaries; Natural languages; Reliability engineering; Speech processing; Speech recognition; Confusability; confusion; multiple-pronunciation dictionary; pronunciation modeling; pronunciation variation; speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.876769
  • Filename
    4067053