Speech Separation Based on The Statistics of Binaural Auditory Features

Author

Brown, Guy J. ; Harding, Sue ; Barker, Jon P.

Author_Institution

Dept. of Comput. Sci., Sheffield Univ.

Volume

5

fYear

2006

fDate

14-19 May 2006

Abstract

A computational auditory scene analysis (CASA) system is described, in which sound separation according to spatial location is combined with the ´missing data´ approach for automatic speech recognition. Time-frequency masks for the missing data recognizer are derived from the statistics of interaural time and level differences; these masks identify acoustic features that constitute reliable evidence of the target speech signal. It is demonstrated that this approach yields good performance in a challenging environment, in which a target voice is contaminated by another talker and reverberation. The ability of the system to generalize to source-receiver configurations that were not encountered during training is discussed

Keywords

hearing; speech recognition; automatic speech recognition; binaural auditory features; computational auditory scene analysis; sound separation; speech separation; target speech signal; time-frequency masks; Automatic speech recognition; Ear; Humans; Image analysis; Reverberation; Robustness; Speech coding; Speech recognition; Statistics; Time frequency analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location

Toulouse

ISSN

1520-6149

Print_ISBN

1-4244-0469-X

Type

conf

DOI

10.1109/ICASSP.2006.1661434

Filename

1661434