مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

3166231

Title :

Music models for music-speech separation

Author :

Hughes, Thad ; Kristjansson, Trausti

Author_Institution :

Google Res., Mountain View, CA, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4917

Lastpage :

4920

Abstract :

We consider the task of speech recognition with loud music background interference. We use model-based music-speech separation and train GMM models for music on the audio prior to speech. We show over 8% relative improvement in WER at 10 dB SNR for a real world Voice Search ASR system. We investigate the relationship between ASR accuracy and the amount of music background used as prologue and the the size of music models. Our study shows that performance peaks when using a music prologue of around 6 seconds to train the music model. We hypothesize that this is due to the dynamic nature of music and the structure of popular music. Adding more history beyond a certain point does not improve results. Additionally, we show moderately sized 8-component music GMM models suffice to model this amount of music prologue.

Keywords :

Gaussian processes; speech recognition; 8-component music GMM models; Gaussian mixture model; SNR; WER; model-based music-speech separation; music models; music prologue; speech recognition task; voice search ASR system; Computational modeling; Data models; Noise; Speech; Speech recognition; Training; Training data; ASR; music; noise reduction; noise robustness; non-stationary noise;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6289022

Filename :

6289022

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3166231