DocumentCode :
3727977
Title :
Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context
Author :
Karim Youssef;Katsutoshi Itoyama;Kazuyoshi Yoshii
Author_Institution :
Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan
fYear :
2015
Firstpage :
407
Lastpage :
412
Abstract :
This paper presents a method of identification and azimuth estimation for one or two concurrent speakers in simultaneous utterances. This method is applicable to human-machine interaction and robot audition. Identification and localization have been rarely mutually addressed and related works rely on time-frequency exploitation strategies to extract and treat each source´s contribution to the received signal. The presented method relies on a training made with one speaker at a time, but it can exploit a speech segment to identify and localize two speakers. A cochlear filtering-based binaural front-end allows to extract equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level difference (ILD) features. Artificial neural networks (ANNs) exploit ERBFCCs to provide identity information, and a histogram-based exploitation of ILDs provides azimuth angle information. The method was evaluated in contexts including overlapping segments in the presence of noises and sound reflections and its efficiency was demonstrated. Even with fully overlapping utterances, we reached an 83% identification rate of both speakers, an 82% estimation accuracy of both azimuths and an 68% correct mutual identity and azimuth estimation rate. At least one speaker was correctly identified and localized in more than 99% of the tests for utterances lasting near 5s.
Keywords :
"Feature extraction","Azimuth","Ear","Robots","Speech","Acoustics","Estimation"
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/SMC.2015.82
Filename :
7379214
Link To Document :
بازگشت