مرکز منطقه ای اطلاع رساني علوم و فناوري - Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network

DocumentCode :

179895

Title :

Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network

Author :

Jian Xue ; Jinyu Li ; Dong Yu ; Seltzer, Mike ; Yifan Gong

Author_Institution :

Microsoft Corp., Redmond, WA, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

6359

Lastpage :

6363

Abstract :

The large number of parameters in deep neural networks (DNN) for automatic speech recognition (ASR) makes speaker adaptation very challenging. It also limits the use of speaker personalization due to the huge storage cost in large-scale deployments. In this paper we address DNN adaptation and personalization issues by presenting two methods based on the singular value decomposition (SVD). The first method uses an SVD to replace the weight matrix of a speaker independent DNN by the product of two low rank matrices. Adaptation is then performed by updating a square matrix inserted between the two low-rank matrices. In the second method, we adapt the full weight matrix but only store the delta matrix - the difference between the original and adapted weight matrices. We decrease the footprint of the adapted model by storing a reduced rank version of the delta matrix via an SVD. The proposed methods were evaluated on short message dictation task. Experimental results show that we can obtain similar accuracy improvements as the previously proposed Kullback-Leibler divergence (KLD) regularized method with far fewer parameters, which only requires 0.89% of the original model storage.

Keywords :

matrix algebra; neural nets; singular value decomposition; speech recognition; ASR; DNN adaptation; KLD; Kullback-Leibler divergence; SVD; automatic speech recognition; deep neural network; delta matrix; low rank matrices; low-footprint speaker adaptation; short message dictation task; singular value decomposition; speaker independent DNN; speaker personalization; square matrix; storage cost; weight matrix; Accuracy; Adaptation models; Data models; Hidden Markov models; Matrix decomposition; Neural networks; Silicon; deep neural network; singular value decomposition; speaker adaptation; speaker personalization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854828

Filename :

6854828

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=179895