DocumentCode
45363
Title
Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information
Author
Sanchez, Jon ; Saratxaga, Ibon ; Hernaez, Inma ; Navas, Eva ; Erro, Daniel ; Raitio, Tuomo
Author_Institution
Aholab Signal Process. Lab., Univ. of the Basque Country, Bilbao, Spain
Volume
10
Issue
4
fYear
2015
fDate
Apr-15
Firstpage
810
Lastpage
820
Abstract
In the field of speaker verification (SV) it is nowadays feasible and relatively easy to create a synthetic voice to deceive a speech driven biometric access system. This paper presents a synthetic speech detector that can be connected at the front-end or at the back-end of a standard SV system, and that will protect it from spoofing attacks coming from state-of-the-art statistical Text to Speech (TTS) systems. The system described is a Gaussian Mixture Model (GMM) based binary classifier that uses natural and copy-synthesized signals obtained from the Wall Street Journal database to train the system models. Three different state-of-the-art vocoders are chosen and modeled using two sets of acoustic parameters: 1) relative phase shift and 2) canonical Mel Frequency Cepstral Coefficients (MFCC) parameters, as baseline. The vocoder dependency of the system and multivocoder modeling features are thoroughly studied. Additional phase-aware vocoders are also tested. Several experiments are carried out, showing that the phase-based parameters perform better and are able to cope with new unknown attacks. The final evaluations, testing synthetic TTS signals obtained from the Blizzard challenge, validate our proposal.
Keywords
Gaussian processes; biometrics (access control); mixture models; speech synthesis; vocoders; GMM; Gaussian mixture model based binary classifier; MFCC; Wall Street Journal database; acoustic parameters:; blizzard challenge; canonical mel frequency cepstral coefficients parameters; copy-synthesized signals; phase information; phase-aware vocoders; relative phase shift; speaker verification; speech driven biometric access system; standard SV system; statistical text to speech systems; synthetic TTS signals; synthetic speech detector; synthetic voice; universal synthetic speech spoofing detection; Databases; Harmonic analysis; Mel frequency cepstral coefficient; Speech; Speech synthesis; Training; Vocoders; BIO-MODA-VOI; Voice biometrics; anti-spoofing; phase information; synthetic speech detection; voice biometrics;
fLanguage
English
Journal_Title
Information Forensics and Security, IEEE Transactions on
Publisher
ieee
ISSN
1556-6013
Type
jour
DOI
10.1109/TIFS.2015.2398812
Filename
7029029
Link To Document