• DocumentCode
    3167566
  • Title

    Facilitating open vocabulary spoken term detection using a multiple pass hybrid search algorithm

  • Author

    Norouzian, Atta ; Rose, Richard

  • Author_Institution
    Dept. of ECE, McGill Univ., Montreal, QC, Canada
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5169
  • Lastpage
    5172
  • Abstract
    This paper presents an efficient approach to spoken term detection (STD) from unstructured audio recordings using word lattices generated off-line from an automatic speech recognition (ASR) system. The approach facilitates open vocabulary STD and focuses specifically on reducing the difference between detection performance obtained for within-vocabulary (IV) and out-of-vocabulary (OOV) search terms. Improved OOV detection performance is obtained by using a two-pass search procedure. Candidate audio segments are retrieved from an index of word lattice paths in the first pass. Locations of OOV search terms are detected in the second pass from a constrained alignment of phonemic expansions of the query terms with phoneme sequences obtained from acoustic segments using an unconstrained neural network based phone decoder. It is found that the combination of first pass segment retrieval and second pass term verification significantly increases STD performance for OOV query terms with no increase in search time for utterances taken from a lecture speech domain.
  • Keywords
    acoustic signal processing; audio recording; neural nets; performance evaluation; query processing; speech recognition; vocabulary; word processing; ASR system; OOV detection performance improvement; OOV query terms; OOV search term locations; acoustic segments; audio retrieval; audio segments; automatic speech recognition; constrained alignment; detection performance; first pass segment retrieval; lecture speech domain; multiple pass hybrid search algorithm; offline word lattice generation; open vocabulary STD; open vocabulary spoken term detection; out-of-vocabulary search terms; phone decoder; phoneme sequences; phonemic expansions; second pass term verification; two-pass search procedure; unconstrained neural network; unstructured audio recording; within-vocabulary search terms; Acoustics; Decoding; Indexing; Lattices; Speech; Vocabulary; Spoken term detection; automatic speech recognition; spoken utterance retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289084
  • Filename
    6289084