Many-to-one voice conversion using exemplar-based sparse representation

Author

Ryo Aihara;Tetsuya Takiguchi;Yasuo Ariki

Author_Institution

Graduate School of System Informatics, Kobe University, Japan

fYear

2015

Firstpage

1

Lastpage

5

Abstract

Voice conversion (VC) is being widely researched in the field of speech processing because of increased interest in using such processing in applications such as personalized Text-to-Speech systems. We present in this paper a many-to-one VC method using exemplar-based sparse representation, which is different from conventional statistical VC. In our previous exemplar-based VC method, input speech was represented by the source dictionary and its sparse coefficients. The source and the target dictionaries are fully coupled and the converted voice is constructed from the source coefficients and the target dictionary. This method requires parallel exemplars (which consist of the source exemplars and target exemplars that have the same texts uttered by the source and target speakers) for dictionary construction. In this paper, we propose a many-to-one VC method in an exemplar-based framework which does not need training data of the source speaker. Some statistical approaches for many-to-one VC have been proposed; however, in the framework of exemplar-based VC, such a method has never been proposed. The effectiveness of our many-to-one VC has been confirmed by comparing its effectiveness with that of a conventional one-to-one NMF-based method and one-to-one GMM-based method.

Keywords

"Dictionaries","Speech","Training data","Sparse matrices","Matrix converters","Signal processing","Noise robustness"

Publisher

ieee

Conference_Titel

Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015 IEEE Workshop on

Type

conf

DOI

10.1109/WASPAA.2015.7336943

Filename

7336943