DocumentCode :
3244069
Title :
Voice signatures
Author :
Shafran, Izhak ; Riley, Michael ; Mohri, Mehryar
Author_Institution :
AT&T Labs.-Res., USA
fYear :
2003
fDate :
30 Nov.-3 Dec. 2003
Firstpage :
31
Lastpage :
36
Abstract :
Most current spoken-dialog systems only extract sequences of words from a speaker´s voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker´s voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the design of a natural spoken-dialog system. This paper explores the problem of extracting automatically and accurately voice signatures from a speaker´s voice. We investigate two approaches for extracting speaker traits: the first focuses on general acoustic and prosodic features, the second on the choice of words used by the speaker. In the first approach, we show that standard speech/nonspeech HMM, conditioned on speaker traits and evaluated on cepstral and pitch features, achieve accuracies well above chance for all examined traits. The second approach, using support vector machines with rational kernels applied to speech recognition lattices, attains an accuracy of about 8.1 % in the task of binary classification of emotion. Our results are based on a corpus of speech data collected from a deployed customer-care application (HMIHY 0300). While still preliminary, our results are significant and show that voice signatures are of practical interest in real-world applications.
Keywords :
cepstral analysis; customer services; emotion recognition; feature extraction; hidden Markov models; interactive systems; speaker recognition; support vector machines; HMIHY 0300; HMM; acoustic features; cepstral features; customer-care application; emotion classification; pitch features; prosodic features; rational kernels; speaker traits; speech mining; speech recognition lattices; spoken-dialog systems; support vector machines; voice signatures; word choice; Cepstral analysis; Data mining; Hidden Markov models; Kernel; Lattices; Loudspeakers; Speech analysis; Speech recognition; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
Type :
conf
DOI :
10.1109/ASRU.2003.1318399
Filename :
1318399
Link To Document :
بازگشت