Word-Conditioned Phone N-Grams for Speaker Recognition

Author

Lei, Haozhen ; Mirghafori, N.

Author_Institution

Int. Comput. Sci. Inst., Berkeley, CA, USA

Volume

4

fYear

2007

fDate

15-20 April 2007

Abstract

We extend the state-of-the-art by applying word-conditioning to constrain phone N-gram features used in speaker recognition. Feature-level combination of 52 word unigrams constraining phone N-grams of order 1, 2, and 3 proved to be the best approach. Our system achieves 18% and 27% improvements compared to a non word-conditioned phone N-grams system on SRE05 and SRE06, respectively. Furthermore, the system achieves 18% and 37% improvements compared to the non word-conditioned phone N-grams system when each system is combined with a GMM-based system on SRE05 and SRE06, suggesting that the word-conditioned features are more complementary. On both corpora, this approach achieves a 4.7% EER standalone, and a 3.3% EER in combination with the non word-conditioned phone N-grams and GMM-based systems. Note that the word-conditioning approach utilizes only 43% of SRE05 data.

Keywords

Gaussian processes; speaker recognition; speech processing; GMM-based system; speaker recognition; word unigrams; word-conditioned phone N-grams; word-conditioning approach; Cepstral analysis; Computer science; Detectors; Feature extraction; Hidden Markov models; Humans; Loudspeakers; Speaker recognition; Speech recognition; Testing; Speaker-recognition; high-level features; phone N-grams; word-conditioning;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on

Conference_Location

Honolulu, HI

ISSN

1520-6149

Print_ISBN

1-4244-0727-3

Type

conf

DOI

10.1109/ICASSP.2007.366897

Filename

4218085