Efficient speaker search over large populations using kernelized locality-sensitive hashing

Author

Jeon, Woojay ; Cheng, Yan-Ming

Author_Institution

Samsung Electron., Suwon, South Korea

fYear

2012

fDate

25-30 March 2012

Firstpage

4261

Lastpage

4264

Abstract

We propose a novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work. The model allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach because of its computational simplicity while maintaining high accuracy. Furthermore, efficiency can be drastically improved when approximating searches using kernelized locality-sensitive hashing (KLSH). From a speaker´s utterance, a set of statistics are extracted according to the utterance comparison model and converted to a set of hash key bits. An Approximate Nearest Neighbor search using the Hamming Distance can be done to find candidate matches with the query speaker, which are then rank-ordered by linearly comparing them with the query using the utterance comparison model. Compared to GMM-based speaker identification and some of its variants that have been proposed to increase its efficiency, the proposed KLSH-based method is orders of magnitude faster while compromising a negligible amount of accuracy for sufficiently long query utterances. At a more fundamental level, we also discuss how our speaker matching framework differs from the traditional Bayesian decision rule used for speaker identification.

Keywords

Bayes methods; Gaussian processes; approximation theory; search problems; speaker recognition; Bayesian decision rule; GMM-based approach; GMM-based speaker identification; Gaussian mixture model-based approach; KLSH-based method; approximate nearest neighbor search; hamming distance; kernelized locality-sensitive hashing; query speaker; speaker matching framework; speaker search; utterance comparison model; Computational modeling; Kernel; Mathematical model; Sociology; Speech; Statistics; Vectors; kernelized locality-sensitive hashing; lsh; speaker identification; speaker search;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6288860

Filename

6288860