Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context

Author

Karim Youssef;Katsutoshi Itoyama;Kazuyoshi Yoshii

Author_Institution

Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan

fYear

2015

Firstpage

407

Lastpage

412

Abstract

This paper presents a method of identification and azimuth estimation for one or two concurrent speakers in simultaneous utterances. This method is applicable to human-machine interaction and robot audition. Identification and localization have been rarely mutually addressed and related works rely on time-frequency exploitation strategies to extract and treat each source´s contribution to the received signal. The presented method relies on a training made with one speaker at a time, but it can exploit a speech segment to identify and localize two speakers. A cochlear filtering-based binaural front-end allows to extract equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level difference (ILD) features. Artificial neural networks (ANNs) exploit ERBFCCs to provide identity information, and a histogram-based exploitation of ILDs provides azimuth angle information. The method was evaluated in contexts including overlapping segments in the presence of noises and sound reflections and its efficiency was demonstrated. Even with fully overlapping utterances, we reached an 83% identification rate of both speakers, an 82% estimation accuracy of both azimuths and an 68% correct mutual identity and azimuth estimation rate. At least one speaker was correctly identified and localized in more than 99% of the tests for utterances lasting near 5s.

Keywords

"Feature extraction","Azimuth","Ear","Robots","Speech","Acoustics","Estimation"

Publisher

ieee

Conference_Titel

Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on

Type

conf

DOI

10.1109/SMC.2015.82

Filename

7379214