مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker verification based processing for robust ASR in co-channel speech scenarios

DocumentCode :

178081

Title :

Speaker verification based processing for robust ASR in co-channel speech scenarios

Author :

Sadjadi, Seyed Omid ; Heck, Larry P.

Author_Institution :

Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Dallas, TX, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

1774

Lastpage :

1778

Abstract :

Co-channel speech, which occurs in monaural audio recordings of two or more overlapping talkers, poses a great challenge for automatic speech applications. Automatic speech recognition (ASR) performance, in particular, has been shown to degrade significantly in the presence of a competing talker. In this paper, assuming a known target talker scenario, we present two different masking strategies based on speaker verification to alleviate the impact of the competing talker (a.k.a. masker) interference on ASR performance. In the first approach, frame-level speaker verification likelihoods are used as reliability measures that control the degree to which each frame contributes to the Viterbi search, while in the second approach time-frequency (T-F) level speaker verification scores form soft masks for speech separation. Effectiveness of the two strategies, both individually and in combination, are evaluated in the context of ASR tasks with speech mixtures at various signal-to-interference ratios (SIR), ranging from 6 dB to -9 dB. Experimental results indicate efficacy of the proposed speaker verification based solutions in mitigating the impact of the competing talker interference on ASR performance. Combination of the two masking techniques yields reductions as large as 43% in word error rate.

Keywords :

cochannel interference; hearing; speaker recognition; speech intelligibility; Viterbi search; automatic speech applications; automatic speech recognition performance; co-channel speech scenarios; competing talker; frame-level speaker verification likelihoods; masking strategies; monaural audio recordings; overlapping talkers; robust ASR; signal-to-interference ratios; soft masks; speaker verification based processing; speech mixtures; speech separation; target talker scenario; time-frequency level speaker verification scores; word error rate; Acoustics; Hidden Markov models; Robustness; Spectrogram; Speech; Speech recognition; Time-frequency analysis; ASR; co-channel speech; soft masking; speaker verification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6853903

Filename :

6853903

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=178081