• DocumentCode
    672397
  • Title

    An empirical study of confusion modeling in keyword search for low resource languages

  • Author

    Saraclar, Murat ; Sethy, Abhinav ; Ramabhadran, Bhuvana ; Mangu, Lidia ; Jia Cui ; Xiaodong Cui ; Kingsbury, Brian ; Mamou, Jonathan

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2013
  • fDate
    8-12 Dec. 2013
  • Firstpage
    464
  • Lastpage
    469
  • Abstract
    Keyword search, in the context of low resource languages, has emerged as a key area of research. The dominant approach in keyword search is to use Automatic Speech Recognition (ASR) as a front end to produce a representation of audio that can be indexed. The biggest drawback of this approach lies in its the inability to deal with out-of-vocabulary words and query terms that are not in the ASR system output. In this paper we present an empirical study evaluating various approaches based on using confusion models as query expansion techniques to address this problem. We present results across four languages using a range of confusion models which lead to significant improvements in keyword search performance as measured by the Maximum Term Weighted Value (MTWV) metric.
  • Keywords
    query formulation; speech recognition; vocabulary; ASR; MTWV metric; audio representation; automatic speech recognition; confusion modeling; keyword search; low resource languages; maximum term weighted value metric; out-of-vocabulary words; query terms; Acoustics; Computational modeling; Hidden Markov models; Indexes; Keyword search; Lattices; Transducers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
  • Conference_Location
    Olomouc
  • Type

    conf

  • DOI
    10.1109/ASRU.2013.6707774
  • Filename
    6707774