• DocumentCode
    3777096
  • Title

    A fast approach to spoken term detection based on prosodic dynamic features

  • Author

    Xuejiao Tan; Lei Wang

  • Author_Institution
    School of Information and Communication Engineering, Beijing University Of Posts and Telecommunications, China
  • fYear
    2015
  • Firstpage
    593
  • Lastpage
    596
  • Abstract
    Model-based spoken term detection usually requires huge number of training data with annotation. When lacking enough training data, DTW-based method is a better choice. However, both the model-based and classical DTW-based methods are based on frame by frame template matching. The computation load is heavy and the search efficiency is poor. We propose a fast two-stage-frameworked approach to spoken term detection. Prosodic dynamic features are exploited to rapidly locate hypothesized spoken term regions in the first stage and Gaussian posteriorgrams are exploited to more precisely verify the local hypothesized regions in the second stage. Since each prosodic feature vector only contains three dimensions and represent several continuous frames speech at one time, we can realize segment-based instead of frame-based template matching to accelerate the whole keywords detection process. The two-stage method has fully exploited the long and short time characteristics of speeches. An experiment is conduced to demonstrate our method improves the speed and obtain similar detection performance under the same condition.
  • Keywords
    "Feature extraction","Computational modeling","Maximum likelihood detection","Matched filters","Nonlinear filters","Speech"
  • Publisher
    ieee
  • Conference_Titel
    Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
  • Print_ISBN
    978-1-4673-8086-7
  • Type

    conf

  • DOI
    10.1109/PIC.2015.7489917
  • Filename
    7489917