مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

DocumentCode :

3744820

Title :

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

Author :

Tara N. Sainath;Ron J. Weiss;Kevin W. Wilson;Arun Narayanan;Michiel Bacchiani; Andrew

Author_Institution :

Google, Inc., New York, NY, USA

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Multichannel ASR systems commonly use separate modules to perform speech enhancement and acoustic modeling. In this paper, we present an algorithm to do multichannel enhancement jointly with the acoustic model, using a raw waveform convolutional LSTM deep neural network (CLDNN). We will show that our proposed method offers ~5% relative improvement in WER over a log-mel CLDNN trained on multiple channels. Analysis shows that the proposed network learns to be robust to varying angles of arrival for the target speaker, and performs as well as a model that is given oracle knowledge of the true location. Finally, we show that training such a network on inputs captured using multiple (linear) array configurations results in a model that is robust to a range of microphone spacings.

Keywords :

"Array signal processing","Microphone arrays","Convolution","Training","Reverberation"

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type :

conf

DOI :

10.1109/ASRU.2015.7404770

Filename :

7404770

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3744820