• DocumentCode
    3727977
  • Title

    Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context

  • Author

    Karim Youssef;Katsutoshi Itoyama;Kazuyoshi Yoshii

  • Author_Institution
    Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan
  • fYear
    2015
  • Firstpage
    407
  • Lastpage
    412
  • Abstract
    This paper presents a method of identification and azimuth estimation for one or two concurrent speakers in simultaneous utterances. This method is applicable to human-machine interaction and robot audition. Identification and localization have been rarely mutually addressed and related works rely on time-frequency exploitation strategies to extract and treat each source´s contribution to the received signal. The presented method relies on a training made with one speaker at a time, but it can exploit a speech segment to identify and localize two speakers. A cochlear filtering-based binaural front-end allows to extract equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level difference (ILD) features. Artificial neural networks (ANNs) exploit ERBFCCs to provide identity information, and a histogram-based exploitation of ILDs provides azimuth angle information. The method was evaluated in contexts including overlapping segments in the presence of noises and sound reflections and its efficiency was demonstrated. Even with fully overlapping utterances, we reached an 83% identification rate of both speakers, an 82% estimation accuracy of both azimuths and an 68% correct mutual identity and azimuth estimation rate. At least one speaker was correctly identified and localized in more than 99% of the tests for utterances lasting near 5s.
  • Keywords
    "Feature extraction","Azimuth","Ear","Robots","Speech","Acoustics","Estimation"
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/SMC.2015.82
  • Filename
    7379214