DocumentCode :
788491
Title :
Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements
Author :
Irino, Toshio ; Patterson, Roy D. ; Kawahara, Hideki
Author_Institution :
Fac. of Syst. Eng., Wakayam Univ.
Volume :
14
Issue :
6
fYear :
2006
Firstpage :
2212
Lastpage :
2221
Abstract :
We propose a new method to segregate concurrent speech sounds using an auditory version of a channel vocoder. The auditory representation of sound, referred to as an "auditory image," preserves fine temporal information, unlike conventional window-based processing systems. This makes it possible to segregate speech sources with an event synchronous procedure. Fundamental frequency information is used to estimate the sequence of glottal pulse times for a target speaker, and to repress the glottal events of other speakers. The procedure leads to robust extraction of the target speech and effective segregation even when the signal-to-noise ratio is as low as 0 dB. Moreover, the segregation performance remains high when the speech contains jitter, or when the estimate of the fundamental frequency FO is inaccurate. This contrasts with conventional comb-filter methods where errors in FO estimation produce a marked reduction in performance. We compared the new method to a comb-filter method using a cross-correlation measure and perceptual recognition experiments. The results suggest that the new method has the potential to supplant comb-filter and harmonic-selection methods for speech enhancement
Keywords :
feature extraction; speaker recognition; speech coding; speech enhancement; speech synthesis; vocoders; auditory image; auditory vocoder; channel vocoder; cross-correlation measure; event-synchronous enhancements; glottal pulse times; perceptual recognition experiments; signal-to-noise ratio; speech enhancement; speech segregation; speech sounds; target speaker synthesis; target speech robust extraction; window-based processing systems; Biomedical engineering; Data mining; Frequency estimation; Image analysis; Loudspeakers; Power harmonic filters; Signal to noise ratio; Speech enhancement; Systems engineering and theory; Vocoders; Auditory image; auditory scene analysis; channel vocoder; comb filter; pitch/F0 extraction;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2006.872611
Filename :
1709908
Link To Document :
بازگشت