• DocumentCode
    1478230
  • Title

    Convex Combination of Multiple Statistical Models With Application to VAD

  • Author

    Petsatodis, Theodoros ; Boukis, Christos ; Talantzis, Fotios ; Tan, Zheng-Hua ; Prasad, Ramjee

  • Author_Institution
    Center for Teleln Frastruktur (CTIF), Aalborg Univ., Aalborg, Denmark
  • Volume
    19
  • Issue
    8
  • fYear
    2011
  • Firstpage
    2314
  • Lastpage
    2327
  • Abstract
    This paper proposes a robust voice activity detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions - a Gaussian, a Laplacian, and a two-sided Gamma - to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.
  • Keywords
    audio signal processing; microphones; speech processing; statistical analysis; voice communication; VAD; adaptive threshold; convex combination; decision smoothing scheme; far field microphones; instantaneous audio input; intra frame correlation; multiple statistical models; noise conditions; reverberation conditions; robust voice activity detector; statistical characteristics; Adaptation model; Frequency domain analysis; Histograms; Laplace equations; Noise; Speech; Speech processing; Classification; convex combination; statistical models; voice activity detection (VAD);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2011.2131131
  • Filename
    5737769