Facilitating open vocabulary spoken term detection using a multiple pass hybrid search algorithm

Author

Norouzian, Atta ; Rose, Richard

Author_Institution

Dept. of ECE, McGill Univ., Montreal, QC, Canada

fYear

2012

fDate

25-30 March 2012

Firstpage

5169

Lastpage

5172

Abstract

This paper presents an efficient approach to spoken term detection (STD) from unstructured audio recordings using word lattices generated off-line from an automatic speech recognition (ASR) system. The approach facilitates open vocabulary STD and focuses specifically on reducing the difference between detection performance obtained for within-vocabulary (IV) and out-of-vocabulary (OOV) search terms. Improved OOV detection performance is obtained by using a two-pass search procedure. Candidate audio segments are retrieved from an index of word lattice paths in the first pass. Locations of OOV search terms are detected in the second pass from a constrained alignment of phonemic expansions of the query terms with phoneme sequences obtained from acoustic segments using an unconstrained neural network based phone decoder. It is found that the combination of first pass segment retrieval and second pass term verification significantly increases STD performance for OOV query terms with no increase in search time for utterances taken from a lecture speech domain.

Keywords

acoustic signal processing; audio recording; neural nets; performance evaluation; query processing; speech recognition; vocabulary; word processing; ASR system; OOV detection performance improvement; OOV query terms; OOV search term locations; acoustic segments; audio retrieval; audio segments; automatic speech recognition; constrained alignment; detection performance; first pass segment retrieval; lecture speech domain; multiple pass hybrid search algorithm; offline word lattice generation; open vocabulary STD; open vocabulary spoken term detection; out-of-vocabulary search terms; phone decoder; phoneme sequences; phonemic expansions; second pass term verification; two-pass search procedure; unconstrained neural network; unstructured audio recording; within-vocabulary search terms; Acoustics; Decoding; Indexing; Lattices; Speech; Vocabulary; Spoken term detection; automatic speech recognition; spoken utterance retrieval;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6289084

Filename

6289084