A Case Study on Back-End Voice Activity Detection for Distributed Specch Recognition System Using Support Vector Machines

Author

Touazi, Azzedine ; Debyeche, Mohamed

Author_Institution

Lab. de Commun. Parlee et de Traitement du Signal (LCPTS), Univ. des Sci. et de la Technol. Houari Boumediene, Algiers, Algeria

fYear

2014

Firstpage

21

Lastpage

26

Abstract

Recently, the Voice Activity Detection (VAD) algorithms based on machine learning techniques have shown impressive results in the area of speech recognition. In this paper, we present a case study and we discuss the performance of VAD based on Support Vector Machines (SVM) for Distributed Speech Recognition (DSR) system. In this case study, the speech and the non-speech frames are detected from the compressed Mel Frequency Cepstral Coefficients (MFCCs), at the back-end (e.g. Server) side, with the aim of improving the VAD performance and reducing the compression bit-rate from the front-end side. By using the trained SVM with polynomial kernel, the SVM-based VAD can produce encouraging detection results. The classification task conducted from the Aurora-2 speech database with different noise conditions shows comparable VAD performance, with respect to ETSI Advanced Front-End (ETSI-AFE) standard.

Keywords

cepstral analysis; learning (artificial intelligence); signal classification; speech recognition; support vector machines; Aurora-2 speech database; DSR system; ETSI Advanced Front-End standard; ETSI-AFE standard; MFCC; SVM training; SVM-based VAD; VAD performance improvement; back-end server side; back-end voice activity detection; classification task; compressed mel frequency cepstral coefficients; compression bit-rate reduction; distributed speech recognition system; front-end side; machine learning techniques; noise conditions; nonspeech frames; polynomial kernel; speech frames; support vector machines; Feature extraction; Kernel; Mel frequency cepstral coefficient; Polynomials; Speech; Speech recognition; Support vector machines; DSR system; mel frequency cepstral coefficients; support vector machines; voice acivity detection;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal-Image Technology and Internet-Based Systems (SITIS), 2014 Tenth International Conference on

Type

conf

DOI

10.1109/SITIS.2014.54

Filename

7081520