DocumentCode :
257809
Title :
Augmented speech production based on real-time statistical voice conversion
Author :
Toda, Tomoki
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
fYear :
2014
fDate :
3-5 Dec. 2014
Firstpage :
592
Lastpage :
596
Abstract :
In human-to-human speech communication, various barriers are caused by some constraints, such as physical constraints causing vocal disorders and environmental constraints making it hard to produce intelligible speech. These barriers would be overcome if our speech production was augmented so that we could produce speech sounds as we want beyond these constraints. Voice conversion (VC) is a technique for modifying speech acoustics, converting non-/para-linguistic information to any form we want while preserving the linguistic content. One of the most popular approaches to VC is based on statistical processing, which is capable of extracting a complex conversion function in a data-driven manner. Although this technique was originally studied in the context of speaker conversion, which converts the voice of a certain speaker to sound like that of another specific speaker, it has great potential to achieve various applications beyond speaker conversion. This paper briefly reviews a trajectory-based conversion method that is capable of effectively reproducing natural speech parameter trajectories utterance by utterance and highlights several techniques that extend this trajectory-based conversion method to achieve real-time conversion processing. Finally this paper shows some examples of real-time VC applications to enhance human-to-human speech communication, such as speaking-aid, silent speech communication, and voice changer/vocal effector.
Keywords :
speaker recognition; speech processing; statistical analysis; augmented speech production; complex conversion function extraction; data-driven method; environmental constraints; human-to-human speech communication enhancement; intelligible speech production; linguistic content preservation; natural speech parameter trajectory reproduction; nonlinguistic information; paralinguistic information; physical constraints; real-time VC applications; real-time conversion processing; real-time statistical voice conversion; silent speech communication; speaker conversion; speaking-aid; speech acoustics; speech sound production; statistical processing; trajectory-based conversion method; vocal disorders; vocal effector; voice changer; Hidden Markov models; Real-time systems; Speech; Speech enhancement; Vectors; augmented speech production; human-to-human speech communication enhancement; real-time processing; statistical voice conversion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on
Conference_Location :
Atlanta, GA
Type :
conf
DOI :
10.1109/GlobalSIP.2014.7032186
Filename :
7032186
Link To Document :
بازگشت