Title :
Fast NMF based approach and VQ based approach using MFCC distance measure for speech recognition from mixed sound
Author :
Nakano, Shunsuke ; Yamamoto, Koji ; Nakagawa, Sachiko
Author_Institution :
Dept. of Comput. Sci. & Eng., Toyohashi Univ. of Technol., Toyohashi, Japan
fDate :
Oct. 29 2013-Nov. 1 2013
Abstract :
We have considered a speech recognition method for mixed sound, consisting of speech and music, that removes only the music based on vector quantization (VQ) and non-negative matrix factorization (NMF). Instead of conventional amplitude spectrum distance measure, MFCC distance measure which is not affected by the pitch is introduced. For isolated word recognition using the clean speech model, an improvement of 53% word error reduction rate was obtained compared with the case of not removing music. Furthermore, a high recognition rate, close to clean speech recognition was obtained at 10dB. For the case of the multi-conditions, our proposed method reduced the error rate of 67% compared with the multi-conditions model.
Keywords :
cepstral analysis; matrix decomposition; music; speech recognition; vector quantisation; MFCC distance measure; VQ based approach; amplitude spectrum distance measure; clean speech model; clean speech recognition; fast NMF based approach; isolated word recognition; mel-frequency cepstrum coefficient; mixed sound; music; non-negative matrix factorization; speech recognition method; vector quantization; word error reduction rate; Cepstrum; Hidden Markov models; Music; Speech; Speech coding; Speech recognition; Vectors;
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific
Conference_Location :
Kaohsiung
DOI :
10.1109/APSIPA.2013.6694143