Title :
Latent Prosody Analysis for Robust Speaker Identification
Author :
Liao, Yuan-Fu ; Chen, Zi-He ; Juang, Yau-Tarng
Author_Institution :
Nat. Taipei Univ. of Technol., Taipei
Abstract :
Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets.
Keywords :
Gaussian processes; feature extraction; information retrieval; maximum likelihood estimation; speaker recognition; energy Gaussian mixture model; full-text document retrieval-like task; latent prosody analysis; maximum-likelihood a priori handset knowledge interpolation; performance degradation; prosodic contour tokenization; robust speaker identification; speaker retrieval; spectral feature-based applications; telecommunication environment; Communication channels; Degradation; Feature extraction; Interpolation; Robustness; Spatial databases; Speaker recognition; Speech analysis; System testing; Telephone sets; Latent prosody analysis; latent semantic analysis; probabilistic latent semantic analysis; speaker identification; speaker recognition; speech prosody;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2007.896660