• DocumentCode
    781434
  • Title

    Robust speech features based on wavelet transform with application to speaker identification

  • Author

    Hsieh, C.-T. ; Lai, E. ; Wang, Y.-C.

  • Author_Institution
    Dept. of Electr. Eng., Tamkang Univ., Taipei, Taiwan
  • Volume
    149
  • Issue
    2
  • fYear
    2002
  • fDate
    4/1/2002 12:00:00 AM
  • Firstpage
    108
  • Lastpage
    114
  • Abstract
    An effective and robust speech feature extraction method is presented. Based on the time-frequency multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristics of an individual speaker, the linear predictive cepstral coefficients of the approximation channel and entropy value of the detail channel for each decomposition process are calculated. In addition, an adaptive thresholding technique for each lower resolution is also applied to remove the influence of noise interference. Experimental results show that using this mechanism not only effectively reduces the influence of noise interference but also improves the recognition performance. Finally, the proposed method is evaluated on the MAT telephone speech database for text-independent speaker identification using the group vector quantisation identifier. Some popular existing methods are also evaluated for comparison, and the results show that the proposed feature extraction algorithm is more effective and robust than the other existing methods. In addition, the performance of the proposed method is very satisfactory even in a low SNR environment corrupted by Gaussian white noise.
  • Keywords
    cepstral analysis; entropy; feature extraction; speaker recognition; time-frequency analysis; vector quantisation; wavelet transforms; Gaussian white noise; MAT telephone speech database; adaptive thresholding technique; approximation channel; detail channel; entropy value; feature extraction; frequency channels; group vector quantisation identifier; linear predictive cepstral coefficients; low SNR environment; noise interference; robust speech; speaker identification; text-independent identification; time-frequency multiresolution property; wavelet transform;
  • fLanguage
    English
  • Journal_Title
    Vision, Image and Signal Processing, IEE Proceedings -
  • Publisher
    iet
  • ISSN
    1350-245X
  • Type

    jour

  • DOI
    10.1049/ip-vis:20020121
  • Filename
    1018001