Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information

Author

Sanchez, Jon ; Saratxaga, Ibon ; Hernaez, Inma ; Navas, Eva ; Erro, Daniel ; Raitio, Tuomo

Author_Institution

Aholab Signal Process. Lab., Univ. of the Basque Country, Bilbao, Spain

Volume

10

Issue

4

fYear

2015

fDate

Apr-15

Firstpage

810

Lastpage

820

Abstract

In the field of speaker verification (SV) it is nowadays feasible and relatively easy to create a synthetic voice to deceive a speech driven biometric access system. This paper presents a synthetic speech detector that can be connected at the front-end or at the back-end of a standard SV system, and that will protect it from spoofing attacks coming from state-of-the-art statistical Text to Speech (TTS) systems. The system described is a Gaussian Mixture Model (GMM) based binary classifier that uses natural and copy-synthesized signals obtained from the Wall Street Journal database to train the system models. Three different state-of-the-art vocoders are chosen and modeled using two sets of acoustic parameters: 1) relative phase shift and 2) canonical Mel Frequency Cepstral Coefficients (MFCC) parameters, as baseline. The vocoder dependency of the system and multivocoder modeling features are thoroughly studied. Additional phase-aware vocoders are also tested. Several experiments are carried out, showing that the phase-based parameters perform better and are able to cope with new unknown attacks. The final evaluations, testing synthetic TTS signals obtained from the Blizzard challenge, validate our proposal.

Keywords

Gaussian processes; biometrics (access control); mixture models; speech synthesis; vocoders; GMM; Gaussian mixture model based binary classifier; MFCC; Wall Street Journal database; acoustic parameters:; blizzard challenge; canonical mel frequency cepstral coefficients parameters; copy-synthesized signals; phase information; phase-aware vocoders; relative phase shift; speaker verification; speech driven biometric access system; standard SV system; statistical text to speech systems; synthetic TTS signals; synthetic speech detector; synthetic voice; universal synthetic speech spoofing detection; Databases; Harmonic analysis; Mel frequency cepstral coefficient; Speech; Speech synthesis; Training; Vocoders; BIO-MODA-VOI; Voice biometrics; anti-spoofing; phase information; synthetic speech detection; voice biometrics;

fLanguage

English

Journal_Title

Information Forensics and Security, IEEE Transactions on

Publisher

ieee

ISSN

1556-6013

Type

jour

DOI

10.1109/TIFS.2015.2398812

Filename

7029029