• DocumentCode
    865874
  • Title

    Robust speaker´s location detection in a vehicle environment using GMM models

  • Author

    Hu, Jwu Sheng ; Cheng, Chieh Cheng ; Liu, Wei Han

  • Author_Institution
    Dept. of Electr. & Control Eng., Nat. Chiao-Tung Univ., Hsinchu, Taiwan
  • Volume
    36
  • Issue
    2
  • fYear
    2006
  • fDate
    4/1/2006 12:00:00 AM
  • Firstpage
    403
  • Lastpage
    412
  • Abstract
    Human-computer interaction (HCI) using speech communication is becoming increasingly important, especially in driving where safety is the primary concern. Knowing the speaker´s location (i.e., speaker localization) not only improves the enhancement results of a corrupted signal, but also provides assistance to speaker identification. Since conventional speech localization algorithms suffer from the uncertainties of environmental complexity and noise, as well as from the microphone mismatch problem, they are frequently not robust in practice. Without a high reliability, the acceptance of speech-based HCI would never be realized. This work presents a novel speaker´s location detection method and demonstrates high accuracy within a vehicle cabinet using a single linear microphone array. The proposed approach utilize Gaussian mixture models (GMM) to model the distributions of the phase differences among the microphones caused by the complex characteristic of room acoustic and microphone mismatch. The model can be applied both in near-field and far-field situations in a noisy environment. The individual Gaussian component of a GMM represents some general location-dependent but content and speaker-independent phase difference distributions. Moreover, the scheme performs well not only in nonline-of-sight cases, but also when the speakers are aligned toward the microphone array but at difference distances from it. This strong performance can be achieved by exploiting the fact that the phase difference distributions at different locations are distinguishable in the environment of a car. The experimental results also show that the proposed method outperforms the conventional multiple signal classification method (MUSIC) technique at various SNRs.
  • Keywords
    Gaussian processes; human computer interaction; microphone arrays; signal classification; speaker recognition; GMM model; Gaussian mixture model; HCI; driving safety; environmental complexity; linear microphone array; microphone mismatch problem; multiple signal classification method; robust speaker location detection method; room acoustics; speaker identification; speech communication; speech localization algorithm; speech-based human-computer interaction; vehicle cabinet; Human computer interaction; Microphone arrays; Multiple signal classification; Oral communication; Robustness; Safety; Signal processing; Speech enhancement; Vehicle detection; Working environment noise; Gaussian mixture models (GMM); human–computer interaction (HCI); microphone array; sound localization; Acoustics; Algorithms; Artificial Intelligence; Computer Simulation; Data Interpretation, Statistical; Ecosystem; Humans; Models, Statistical; Normal Distribution; Sound Localization; Sound Spectrography; Transportation;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4419
  • Type

    jour

  • DOI
    10.1109/TSMCB.2005.859084
  • Filename
    1605386