مرکز منطقه ای اطلاع رساني علوم و فناوري - Voice conversion based on Non-negative matrix factorization using phoneme-categorized dictionary

DocumentCode :

180500

Title :

Voice conversion based on Non-negative matrix factorization using phoneme-categorized dictionary

Author :

Aihara, Ryo ; Nakashika, Toru ; Takiguchi, Tetsuya ; Ariki, Yasuo

Author_Institution :

Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

7894

Lastpage :

7898

Abstract :

We present in this paper an exemplar-based voice conversion (VC) method using a phoneme-categorized dictionary. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for spectral conversion between different speakers. In our previous NMF-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all the training exemplars (frames), and it may cause mismatching of phonemes between input signals and selected exemplars. In this paper, in order to reduce the mismatching of phoneme alignment, we propose a phoneme-categorized sub-dictionary and a dictionary selection method using NMF. By using the sub-dictionary, the performance of VC is improved compared to a conventional NMF-based VC. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method and a conventional NMF-based method.

Keywords :

Gaussian processes; matrix decomposition; mixture models; sparse matrices; speaker recognition; Gaussian mixture model; input source signal; nonnegative matrix factorization; phoneme categorized dictionary; source exemplars; sparse representation; spectral conversion; target exemplars; voice conversion; Dictionaries; Gaussian mixture model; Sparse matrices; Speech; Training; Vectors; nonnegative matrix factorization; sparse representation; sub-dictionary; voice conversion;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6855137

Filename :

6855137

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=180500