مرکز منطقه ای اطلاع رساني علوم و فناوري - Latent Prosody Analysis for Robust Speaker Identification

DocumentCode :

1060158

Title :

Latent Prosody Analysis for Robust Speaker Identification

Author :

Liao, Yuan-Fu ; Chen, Zi-He ; Juang, Yau-Tarng

Author_Institution :

Nat. Taipei Univ. of Technol., Taipei

Volume :

Issue :

fYear :

2007

Firstpage :

1870

Lastpage :

1883

Abstract :

Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets.

Keywords :

Gaussian processes; feature extraction; information retrieval; maximum likelihood estimation; speaker recognition; energy Gaussian mixture model; full-text document retrieval-like task; latent prosody analysis; maximum-likelihood a priori handset knowledge interpolation; performance degradation; prosodic contour tokenization; robust speaker identification; speaker retrieval; spectral feature-based applications; telecommunication environment; Communication channels; Degradation; Feature extraction; Interpolation; Robustness; Spatial databases; Speaker recognition; Speech analysis; System testing; Telephone sets; Latent prosody analysis; latent semantic analysis; probabilistic latent semantic analysis; speaker identification; speaker recognition; speech prosody;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2007.896660

Filename :

4276757

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1060158