Title :
Single-channel multi-talker-localization based on maximum likelihood
Author :
Takashima, Ryoichi ; Takiguchi, Tetsuya ; Ariki, Yasuo
Author_Institution :
Grad. Sch. of Eng., Kobe Univ., Kobe, Japan
Abstract :
This paper presents a sound source (talker) localization method using only a single microphone based upon maximum likelihood. In our previous work, we proposed GMM (Gaussian mixture model) separation for estimation of the sound source direction, where the observed (reverberant) speech is separated into the acoustic transfer function and the clean speech GMM, and showed its effectiveness for the single-talker localization task. In this paper, we discuss a multi-talker localization method using GMM separation and model composition. Model composition is used to represent speech signals observed in a reverberant environment corresponding to every conceivable combination of positions of the sound sources, where composite models are obtained through composition of talker´s speech model and acoustic transfer functions estimated using GMM separation. For each test data set, we find a maximum-likelihood model from among the composite models corresponding to each combination of talkers´ positions. The effectiveness of this method has been confirmed by two-talker localization experiments performed in a room environment.
Keywords :
Gaussian processes; acoustic signal processing; direction-of-arrival estimation; maximum likelihood estimation; microphones; reverberation; signal representation; source separation; speech recognition; transfer functions; Gaussian mixture model; acoustic transfer function; clean speech GMM separation; maximum-likelihood estimation model; reverberant room environment; single microphone; single-channel multitalker-localization; single-talker localization; sound source direction estimation; sound source localization method; speech signal representation; talker speech model composition; two-talker localization experiment; Acoustic testing; Acoustical engineering; Cepstral analysis; Maximum likelihood estimation; Microphone arrays; Multiple signal classification; Phased arrays; Speech; Training data; Transfer functions; acoustic transfer function; maximum likelihood; model composition; single channel; talker localization;
Conference_Titel :
Statistical Signal Processing, 2009. SSP '09. IEEE/SP 15th Workshop on
Conference_Location :
Cardiff
Print_ISBN :
978-1-4244-2709-3
Electronic_ISBN :
978-1-4244-2711-6
DOI :
10.1109/SSP.2009.5278540