Detection on PSOLA-modified voices by seeking out duplicated fragments

Author

Shen, Yifeng ; Jia, Jia ; Cai, Lianhong

Author_Institution

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear

2012

fDate

19-20 May 2012

Firstpage

2177

Lastpage

2182

Abstract

Pitch Synchronous Overlap-Add (PSOLA) refers to a family of signal processing techniques widely used for prosodic modification. They can be used to modify one person´s voice by altering prosodic characteristics of speech, making the voice unrecognizable or unidentifiable. Well-modified voices may even make the speaker recognition process, which is critical in digital audio forensic framework, out of work. Time-domain PSOLA (TD-PSOLA) is the most popular algorithm in PSOLA family. Time- and pitch-scaling form of modifications can be applied by TD-PSOLA, and the synthesis quality is extremely high provided that the modifications do not exceed a factor of two. Our paper presents a simple method to figure out whether a given speech waveform is modified or not by the TD-PSOLA algorithm. Seeking out duplicated fragments from time domain of the waveform, we extract the occurrence number of duplicated fragments as well as occurrence frequency in voiced portions of speech. A single feature (duplicated fragments density, DFD) is then calculated, and compared with a threshold (obtained from plenty of former statistic results) to decide whether the questioned speech waveform is modified. Experimental results demonstrate the effectiveness of our method in detecting modified voices, which are pitch heightened and/or duration lengthened using the TD-PSOLA algorithm.

Keywords

speaker recognition; PSOLA modified voice detection; duplicated fragments; occurrence frequency; pitch synchronous overlap add; seeking out duplicated fragments; signal processing techniques; speaker recognition process; speech voiced portions; speech waveform; voice unidentifiable; voice unrecognizable; Feature extraction; Forensics; Signal processing algorithms; Speech; Speech processing; Timbre; Time domain analysis; Digital Audio Forensic; Duplicated Fragments; PSOLA; Speech Processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Systems and Informatics (ICSAI), 2012 International Conference on

Conference_Location

Yantai

Print_ISBN

978-1-4673-0198-5

Type

conf

DOI

10.1109/ICSAI.2012.6223483

Filename

6223483