DocumentCode :
180500
Title :
Voice conversion based on Non-negative matrix factorization using phoneme-categorized dictionary
Author :
Aihara, Ryo ; Nakashika, Toru ; Takiguchi, Tetsuya ; Ariki, Yasuo
Author_Institution :
Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
7894
Lastpage :
7898
Abstract :
We present in this paper an exemplar-based voice conversion (VC) method using a phoneme-categorized dictionary. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for spectral conversion between different speakers. In our previous NMF-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all the training exemplars (frames), and it may cause mismatching of phonemes between input signals and selected exemplars. In this paper, in order to reduce the mismatching of phoneme alignment, we propose a phoneme-categorized sub-dictionary and a dictionary selection method using NMF. By using the sub-dictionary, the performance of VC is improved compared to a conventional NMF-based VC. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method and a conventional NMF-based method.
Keywords :
Gaussian processes; matrix decomposition; mixture models; sparse matrices; speaker recognition; Gaussian mixture model; input source signal; nonnegative matrix factorization; phoneme categorized dictionary; source exemplars; sparse representation; spectral conversion; target exemplars; voice conversion; Dictionaries; Gaussian mixture model; Sparse matrices; Speech; Training; Vectors; nonnegative matrix factorization; sparse representation; sub-dictionary; voice conversion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6855137
Filename :
6855137
Link To Document :
بازگشت