• DocumentCode
    2700702
  • Title

    Divergence-Based Similarity Measure for Spoken Document Retrieval

  • Author

    Peng Liu ; Soong, Frank K. ; Jian-Lai Thou

  • Author_Institution
    Microsoft Res. Asia, Beijing, China
  • Volume
    4
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Abstract
    We propose a novel, divergence-based similarity measure for spoken document retrieval (SDR). We derive a dynamic programming algorithm that measures Kullback-Leibler divergence between two HMMs first. The measure is further generalized to a graph matching algorithm, which is efficient for SDR application. The proposed approach compares the underlying acoustic models of keywords and a target database to alleviate the impact of mismatched vocabulary and language model, e.g. different domains. Experimental results on the Wall Street Journal (WSJ) database show that the proposed approach achieves a comparable performance, compared with the word posterior based approach. It outperforms the latter when there is a mismatch in language model. The approach is promising for building an open-vocabulary, domain independent SDR application.
  • Keywords
    document handling; graph theory; hidden Markov models; information retrieval; mathematical programming; matrix algebra; speech processing; HMM; Kullback-Leibler divergence; divergence-based similarity measure; graph matching algorithm; programming algorithm; spoken document retrieval; word posterior based approach; Acoustic applications; Acoustic measurements; Asia; Costs; Databases; Dynamic programming; Hidden Markov models; Natural languages; Speech; Vocabulary; Dynamic programming; Hidden Markov models; Kullback-Leibler divergence; Spoken document retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0727-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2007.367170
  • Filename
    4218044