مرکز منطقه ای اطلاع رساني علوم و فناوري - Deep learning for monaural speech separation

DocumentCode :

178003

Title :

Deep learning for monaural speech separation

Author :

Po-Sen Huang ; Minje Kim ; Hasegawa-Johnson, Mark ; Smaragdis, Paris

Author_Institution :

Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

1562

Lastpage :

1566

Abstract :

Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint. Moreover, we explore a discriminative training criterion for the neural networks to further enhance the separation performance. We evaluate our approaches using the TIMIT speech corpus for a monaural speech separation task. Our proposed models achieve about 3.8~4.9 dB SIR gain compared to NMF models, while maintaining better SDRs and SARs.

Keywords :

learning (artificial intelligence); recurrent neural nets; signal reconstruction; source separation; speech processing; NMF models; SARs; SDRs; TIMIT speech corpus; deep learning models; deep neural networks; masking layer; monaural source separation; monaural speech separation; reconstruction constraint; recurrent neural networks; Artificial neural networks; Discrete Fourier transforms; Source separation; Speech; Time-frequency analysis; Training; Deep Learning; Monaural Source Separation; Time-Frequency Masking;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6853860

Filename :

6853860

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=178003