• DocumentCode
    3162795
  • Title

    Efficient speaker search over large populations using kernelized locality-sensitive hashing

  • Author

    Jeon, Woojay ; Cheng, Yan-Ming

  • Author_Institution
    Samsung Electron., Suwon, South Korea
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4261
  • Lastpage
    4264
  • Abstract
    We propose a novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work. The model allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach because of its computational simplicity while maintaining high accuracy. Furthermore, efficiency can be drastically improved when approximating searches using kernelized locality-sensitive hashing (KLSH). From a speaker´s utterance, a set of statistics are extracted according to the utterance comparison model and converted to a set of hash key bits. An Approximate Nearest Neighbor search using the Hamming Distance can be done to find candidate matches with the query speaker, which are then rank-ordered by linearly comparing them with the query using the utterance comparison model. Compared to GMM-based speaker identification and some of its variants that have been proposed to increase its efficiency, the proposed KLSH-based method is orders of magnitude faster while compromising a negligible amount of accuracy for sufficiently long query utterances. At a more fundamental level, we also discuss how our speaker matching framework differs from the traditional Bayesian decision rule used for speaker identification.
  • Keywords
    Bayes methods; Gaussian processes; approximation theory; search problems; speaker recognition; Bayesian decision rule; GMM-based approach; GMM-based speaker identification; Gaussian mixture model-based approach; KLSH-based method; approximate nearest neighbor search; hamming distance; kernelized locality-sensitive hashing; query speaker; speaker matching framework; speaker search; utterance comparison model; Computational modeling; Kernel; Mathematical model; Sociology; Speech; Statistics; Vectors; kernelized locality-sensitive hashing; lsh; speaker identification; speaker search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288860
  • Filename
    6288860