مرکز منطقه ای اطلاع رساني علوم و فناوري - Deep Belief Networks Based Voice Activity Detection

DocumentCode :

742860

Title :

Deep Belief Networks Based Voice Activity Detection

Author :

Xiao-Lei Zhang ; Ji Wu

Author_Institution :

Dept. of Electron. Eng., Tsinghua Univ., Beijing, China

Volume :

Issue :

fYear :

2013

fDate :

4/1/2013 12:00:00 AM

Firstpage :

697

Lastpage :

710

Abstract :

Fusing the advantages of multiple acoustic features is important for the robustness of voice activity detection (VAD). Recently, the machine-learning-based VADs have shown a superiority to traditional VADs on multiple feature fusion tasks. However, existing machine-learning-based VADs only utilize shallow models, which cannot explore the underlying manifold of the features. In this paper, we propose to fuse multiple features via a deep model, called deep belief network (DBN). DBN is a powerful hierarchical generative model for feature extraction. It can describe highly variant functions and discover the manifold of the features. We take the multiple serially-concatenated features as the input layer of DBN, and then extract a new feature by transferring these features through multiple nonlinear hidden layers. Finally, we predict the class of the new feature by a linear classifier. We further analyze that even a single-hidden-layer-based belief network is as powerful as the state-of-the-art models in the machine-learning-based VADs. In our empirical comparison, ten common features are used for performance analysis. Extensive experimental results on the AURORA2 corpus show that the DBN-based VAD not only outperforms eleven referenced VADs, but also can meet the real-time detection demand of VAD. The results also show that the DBN-based VAD can fuse the advantages of multiple features effectively.

Keywords :

acoustic signal detection; belief networks; feature extraction; learning (artificial intelligence); sensor fusion; signal classification; speech processing; AURORA2 corpus; DBN; acoustic feature; deep belief network; feature extraction; feature fusion; feature manifold; linear classifier; machine-learning-based VAD; nonlinear hidden layer; single-hidden-layer-based belief network; voice activity detection; Acoustics; Feature extraction; Fuses; Speech; Speech processing; Support vector machines; Training; Deep learning; information fusion; voice activity detection;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2012.2229986

Filename :

6362186

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=742860