DocumentCode :
1689305
Title :
Synthetic speech detection using temporal modulation feature
Author :
Zhizheng Wu ; Xiong Xiao ; Eng Siong Chng ; Haizhou Li
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ. (NTU), Singapore, Singapore
fYear :
2013
Firstpage :
7234
Lastpage :
7238
Abstract :
Voice conversion and speaker adaptation techniques present a threat to current state-of-the-art speaker verification systems. To prevent such spoofing attack and enhance the security of speaker verification systems, the development of anti-spoofing techniques to distinguish synthetic and human speech is necessary. In this study, we continue the quest to discriminate synthetic and human speech. Motivated by the facts that current analysis-synthesis techniques operate on frame level and make the frame-by-frame independence assumption, we proposed to adopt magnitude/phase modulation features to detect synthetic speech from human speech. Modulation features derived from magnitude/phase spectrum carry long-term temporal information of speech, and may be able to detect temporal artifacts caused by the frame-by-frame processing in the synthesis of speech signal. From our synthetic speech detection results, the modulation features provide complementary information to magnitude/phase features. The best detection performance is obtained by fusing phase modulation features and phase features, yielding an equal error rate of 0.89%, which is significantly lower than the 1.25% of phase features and 10.98% of MFCC features.
Keywords :
phase modulation; security of data; speaker recognition; speech processing; speech synthesis; MFCC features; frame-by-frame independence assumption; human speech; phase modulation; security; speaker adaptation techniques; speaker verification systems; speech signal synthesis; spoofing attack; synthetic speech detection; temporal modulation feature; voice conversion; Feature extraction; Mel frequency cepstral coefficient; Phase modulation; Spectrogram; Speech; Speech processing; Anti-spoofing attack; modulation; phase modulation; synthetic detection; temporal feature;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639067
Filename :
6639067
Link To Document :
بازگشت