DocumentCode
27460
Title
Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping
Author
Mantena, Gautam ; Achanta, Sivanand ; Prahallad, K.
Author_Institution
Int. Inst. of Inf. Technol. (IIIT-H), Hyderabad, India
Volume
22
Issue
5
fYear
2014
fDate
May-14
Firstpage
946
Lastpage
955
Abstract
The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α2)) which is faster than NS-DTW.
Keywords
Gaussian processes; cepstral analysis; frequency-domain analysis; query processing; signal detection; speech recognition; time warp simulation; DTW based algorithm; Gaussian posteriorgrams; Mel-frequency cepstral coefficients; QbE-STD; computational upper bound; fast NS-DTW; frequency domain linear prediction cepstral coefficients; nonsegmental dynamic time warping; perceptual linear prediction cepstral coefficients; query-by-example spoken term detection; speech signal; spoken audio data; spoken query; traditional spectral parameters; Computational modeling; Frequency-domain analysis; Mel frequency cepstral coefficient; Speech; Speech processing; Vectors; Dynamic time warping; fast search; frequency domain linear prediction; query-by-example spoken term detection;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2014.2311322
Filename
6763005
Link To Document