DocumentCode
2396125
Title
Detection on PSOLA-modified voices by seeking out duplicated fragments
Author
Shen, Yifeng ; Jia, Jia ; Cai, Lianhong
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear
2012
fDate
19-20 May 2012
Firstpage
2177
Lastpage
2182
Abstract
Pitch Synchronous Overlap-Add (PSOLA) refers to a family of signal processing techniques widely used for prosodic modification. They can be used to modify one person´s voice by altering prosodic characteristics of speech, making the voice unrecognizable or unidentifiable. Well-modified voices may even make the speaker recognition process, which is critical in digital audio forensic framework, out of work. Time-domain PSOLA (TD-PSOLA) is the most popular algorithm in PSOLA family. Time- and pitch-scaling form of modifications can be applied by TD-PSOLA, and the synthesis quality is extremely high provided that the modifications do not exceed a factor of two. Our paper presents a simple method to figure out whether a given speech waveform is modified or not by the TD-PSOLA algorithm. Seeking out duplicated fragments from time domain of the waveform, we extract the occurrence number of duplicated fragments as well as occurrence frequency in voiced portions of speech. A single feature (duplicated fragments density, DFD) is then calculated, and compared with a threshold (obtained from plenty of former statistic results) to decide whether the questioned speech waveform is modified. Experimental results demonstrate the effectiveness of our method in detecting modified voices, which are pitch heightened and/or duration lengthened using the TD-PSOLA algorithm.
Keywords
speaker recognition; PSOLA modified voice detection; duplicated fragments; occurrence frequency; pitch synchronous overlap add; seeking out duplicated fragments; signal processing techniques; speaker recognition process; speech voiced portions; speech waveform; voice unidentifiable; voice unrecognizable; Feature extraction; Forensics; Signal processing algorithms; Speech; Speech processing; Timbre; Time domain analysis; Digital Audio Forensic; Duplicated Fragments; PSOLA; Speech Processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location
Yantai
Print_ISBN
978-1-4673-0198-5
Type
conf
DOI
10.1109/ICSAI.2012.6223483
Filename
6223483
Link To Document