• DocumentCode
    180480
  • Title

    Efficient spoken term detection using confusion networks

  • Author

    Mangu, Lidia ; Kingsbury, Brian ; Soltau, Hagen ; Hong-Kwang Kuo ; Picheny, Michael

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    7844
  • Lastpage
    7848
  • Abstract
    In this paper, we present a fast, vocabulary independent algorithm for spoken term detection (STD) that demonstrates a word-based index is sufficient to achieve good performance for both in-vocabulary (IV) and out-of-vocabulary (OOV) terms. Previous approaches have required that a separate index be built at the sub-word level and then expanded to allow for matching OOV terms. Such a process, while accurate, is expensive in both time and memory. In the proposed architecture, a word-level confusion network (CN) based index is used for both IV and OOV search. This is implemented using a flexible WFST framework. Comparisons on 3 Babel languages (Tagalog, Pashto and Turkish) show that CN-based indexing results in better performance compared with the lattice approach while being orders of magnitude faster and having a much smaller footprint.
  • Keywords
    speech processing; vocabulary; 3 Babel language; CN-based indexing; IV term; OOV term; Pashto; STD; Tagalog; Turkish; flexible WFST framework; in-vocabulary term; out-of-vocabulary term; spoken term detection; vocabulary independent algorithm; word-based index; word-level confusion network; Acoustics; Indexing; Lattices; Speech; Transducers; Vocabulary; audio indexing; confusion networks; keyword search; keyword spotting; spoken term detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6855127
  • Filename
    6855127