DocumentCode :
3863291
Title :
Music removal by convolutional denoising autoencoder in speech recognition
Author :
Mengyuan Zhao;Dong Wang;Zhiyong Zhang;Xuewei Zhang
Author_Institution :
Center for Speech and Language Technology (CSLT) Research Institute of Information Technology, Tsinghua University, Tsinghua National Lab for Information Science and Technology
fYear :
2015
Firstpage :
338
Lastpage :
341
Abstract :
Music embedding often causes significant performance degradation in automatic speech recognition (ASR). This paper proposes a music-removal method based on denoising autoencoder (DAE) that learns and removes music from music-embedded speech signals. Particularly, we focus on convolutional denoising autoencoder (CDAE) that can learn local musical patterns by convolutional feature extraction. Our study shows that the CDAE model can learn patterns of music in different genres and the CDAE-based music removal offers significant performance improvement for ASR. Additionally, we demonstrate that this music-removal approach is largely language independent, which means that a model trained with data in one language can be applied to remove music from speech in another language, and models trained with multilingual data may lead to better performance.
Keywords :
"Multiple signal classification","Speech","Training","Convolution","Speech recognition","Data models","Harmonic analysis"
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type :
conf
DOI :
10.1109/APSIPA.2015.7415289
Filename :
7415289
Link To Document :
بازگشت