DocumentCode
3752071
Title
Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods
Author
Naoaki Hashimoto;Kazumasa Yamamoto;Seiichi Nakagawa
Author_Institution
Department of Computer Science and Engineering, Toyohashi University of Technology, Japan
fYear
2015
Firstpage
27
Lastpage
30
Abstract
We investigated speech recognition methods for mixed speech and music that only remove music based on non-negative matrix factorization (NMF). In this paper, we introduced the Euclidean distance of logarithm spectrum DLOG as a distance measure for source separation, which may correspond to the distance measure for speech recognition, and compared it with such traditional distance measures as the Kullback-Leibler divergence and the Itakura-Saito divergence. We improved the speech recognition performance by pooling the estimated speech, the mixed sound, and clean speech to train the acoustic model. For isolated word recognition with NMF using DLOG, we obtained an improvement from the baseline. Using the Itakura-Saito divergence and the "clean, multi-condition and noise-adaptive training model", we reduced the word error rate of 54.7% relative from the case of the "multi-condition training model" on average, from 57.6% to 80.8% word recognition rate.
Keywords
Decision support systems
Publisher
ieee
Conference_Titel
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type
conf
DOI
10.1109/APSIPA.2015.7415319
Filename
7415319
Link To Document