• DocumentCode
    27460
  • Title

    Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping

  • Author

    Mantena, Gautam ; Achanta, Sivanand ; Prahallad, K.

  • Author_Institution
    Int. Inst. of Inf. Technol. (IIIT-H), Hyderabad, India
  • Volume
    22
  • Issue
    5
  • fYear
    2014
  • fDate
    May-14
  • Firstpage
    946
  • Lastpage
    955
  • Abstract
    The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α2)) which is faster than NS-DTW.
  • Keywords
    Gaussian processes; cepstral analysis; frequency-domain analysis; query processing; signal detection; speech recognition; time warp simulation; DTW based algorithm; Gaussian posteriorgrams; Mel-frequency cepstral coefficients; QbE-STD; computational upper bound; fast NS-DTW; frequency domain linear prediction cepstral coefficients; nonsegmental dynamic time warping; perceptual linear prediction cepstral coefficients; query-by-example spoken term detection; speech signal; spoken audio data; spoken query; traditional spectral parameters; Computational modeling; Frequency-domain analysis; Mel frequency cepstral coefficient; Speech; Speech processing; Vectors; Dynamic time warping; fast search; frequency domain linear prediction; query-by-example spoken term detection;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2311322
  • Filename
    6763005