Title :
On automatic voice casting for expressive speech: Speaker recognition vs. speech classification
Author :
Obin, Nicolas ; Roebel, A. ; Bachman, Gregoire
Author_Institution :
IRCAM, UPMC, Paris, France
Abstract :
This paper presents the first large-scale automatic voice casting system, and explores the adaptation of speaker recognition techniques to measure voice similarities. The proposed system is based on the representation of a voice by classes (e.g., age/gender, voice quality, emotion). First, a multi-label system is used to classify speech into classes. Then, the output probabilities for each class are concatenated to form a vector that represents the vocal signature of a speech recording. Finally, a similarity search is performed on the vocal signatures to determine the set of target actors that are the most similar to a speech recording of a source actor. In a subjective experiment conducted in the real-context of voice casting for video games, the multi-label system clearly outperforms standard speaker recognition systems. This indicates evidence that speech classes successfully capture the principal directions that are used in the perception of voice similarity.
Keywords :
computer games; language translation; natural language processing; probability; signal classification; speaker recognition; speech processing; expressive speech; large-scale automatic voice casting system; multilabel system; output probabilities; similarity search; speaker recognition technique; speech classification; speech recording; video games; vocal signatures; voice representation; voice similarity measurement; voice similarity perception; Acoustics; Casting; Speaker recognition; Speech; Speech recognition; Support vector machines; Vectors; speaker recognition; speech classification; voice casting; voice similarity;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853737