• DocumentCode
    2768487
  • Title

    Roles of high-fidelity acoustic modeling in robust speech recognition

  • Author

    Deng, Li

  • Author_Institution
    Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA
  • fYear
    2007
  • fDate
    9-13 Dec. 2007
  • Firstpage
    1
  • Lastpage
    13
  • Abstract
    In this paper I argue that high-fidelity acoustic models have important roles to play in robust speech recognition in face of a multitude of variability ailing many current systems. The discussion of high-fidelity acoustic modeling is posited in the context of general statistical pattern recognition, in which the probabilistic-modeling component that embeds partial, imperfect knowledge is the fundamental building block enabling all other components including recognition error measure, decision rule, and training criterion. Within the session’s theme of acoustic modeling and robust speech recognition, I advance my argument using two concrete examples. First, an acoustic-modeling framework which embeds the knowledge of articulatory-like constraints is shown to be better able to account for the speech variability arising from varying speaking behavior (e.g., speaking rate and style) than without the use of the constraints. This higher-fidelity acoustic model is implemented in a multi-layer dynamic Bayesian network and computer simulation results are presented. Second, the variability in the acoustically distorted speech under adverse environments can be more precisely represented and more effectively handled using the information about phase asynchrony between the un-distorted speech and the mixing noise than without using such information. This high-fidelity, phase-sensitive acoustic distortion model is integrated into the same multi-layer Bayesian network but at separate, causally related layers from those representing the speaking-behavior variability. Related experimental results in the literature are reviewed, providing empirical support to the significant roles that the phase-sensitive model plays in environment-robust speech recognition.
  • Keywords
    Acoustic distortion; Acoustic measurements; Bayesian methods; Computer simulation; Concrete; Context modeling; Pattern recognition; Robustness; Speech enhancement; Speech recognition; acoustic modeling; dynamic Bayesian network; generative modeling; high fidelity; noise robustness; phase asynchrony; speaking behavior; variability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
  • Conference_Location
    The Westin Miyako Kyoto
  • Print_ISBN
    978-1-4244-1746-9
  • Electronic_ISBN
    978-1-4244-1746-9
  • Type

    conf

  • DOI
    10.1109/ASRU.2007.4430075
  • Filename
    4430075