DocumentCode
694549
Title
Onset detection algorithm in voice activity detection for Mandarin
Author
Huan Wang ; Lei Wang
Author_Institution
Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear
2013
fDate
12-13 Oct. 2013
Firstpage
1148
Lastpage
1151
Abstract
Voice activity detection (VAD) is one of the most challenging problems in the field of speech signal processing. The statistical model based VADs have been widely studied in the recent literatures, which usually utilize hangover algorithms to prevent clipping of weak speech tails. However, little attention has been paid on the initial consonants, and non-negligible onset detection errors might be incurred especially when the SNR is low. Since most of the Mandarin syllables start with initial consonants, an onset detection algorithm is proposed in this paper to improve the performance of VAD for Mandarin. Although consonants are mostly noise-like, they produce spectral energy distributed more towards the higher frequencies. To this characteristic, the proposed algorithm makes decision whether the weak-start detection could possibly been dampened by noise based on the posterior SNR of high frequency band, and then it makes correction correspondingly after estimating whether the week-start speech frames mistaken for nonspeech frames exist. It shows that the proposed algorithm achieves a considerable performance improvement. Furthermore, this algorithm is independent of noise type.
Keywords
maximum likelihood estimation; natural language processing; speech recognition; Mandarin syllables; hangover algorithms; high-frequency band; initial consonants; noise type; noise-like consonants; nonnegligible onset detection errors; nonspeech frames; onset detection algorithm; performance improvement; posterior SNR; spectral energy distribution; speech signal processing; statistical model-based VAD; voice activity detection; weak-speech tail clipping prevention; weak-start detection; week-start speech frames; Detection algorithms; Hidden Markov models; Signal processing algorithms; Signal to noise ratio; Speech; Speech processing; likelihood ratio test; onset detection; voice activity detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
Conference_Location
Dalian
Type
conf
DOI
10.1109/ICCSNT.2013.6967305
Filename
6967305
Link To Document