• DocumentCode
    1544802
  • Title

    Robust text-independent speaker identification over telephone channels

  • Author

    Murthy, Hema A. ; Beaufays, Françoise ; Heck, Larry P. ; Weintraub, Mitchel

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • Volume
    7
  • Issue
    5
  • fYear
    1999
  • fDate
    9/1/1999 12:00:00 AM
  • Firstpage
    554
  • Lastpage
    568
  • Abstract
    This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: (1) extracting features that are robust against channel variations and (2) transforming the speaker models to compensate for channel effects. First, an experimental study shows that optimizing the front end processing of the speech signal can significantly improve speaker recognition performance. A new filterbank design is introduced to improve the robustness of the speech spectrum computation in the front-end unit. Next, a new feature based on spectral slopes is described. Its ability to discriminate between speakers is shown to be superior to that of the traditional cepstrum. This feature can be used alone or combined with the cepstrum. The second part of the paper presents two model transformation methods that further reduce channel effects. These methods make use of a locally collected stereo database to estimate a speaker-independent variance transformation for each speech feature used by the classifier. The transformations constructed on this stereo database can then be applied to speaker models derived from other databases. Combined, the methods developed in this paper resulted in a 38% relative improvement on the closed-set 30-s training 5-s testing condition of the NIST´95 Evaluation task, after cepstral mean removal
  • Keywords
    cepstral analysis; channel bank filters; feature extraction; filtering theory; signal classification; speaker recognition; telecommunication channels; telephony; NIST´95 Evaluation task; acoustic mismatches; cepstral mean removal; cepstrum; channel effects compensation; channel variations; classifier; closed-set text-independent speaker identification; experimental study; feature extraction; filterbank design; front end processing; front-end unit; locally collected stereo database; model transformation methods; robust text-independent speaker identification; speaker models; speaker recognition performance; speaker-independent variance transformation; spectral slopes; speech samples; speech signal; speech spectrum computation; telephone channels; testing data; training data; Acoustic testing; Cepstrum; Data mining; Feature extraction; Loudspeakers; Robustness; Signal processing; Spatial databases; Speech processing; Telephony;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.784108
  • Filename
    784108