Keyword spotting enhancement for video soundtrack indexing

Author

Gelin, Philippe ; Wellekens, Chris J.

Author_Institution

Dept. of Multimedia Commun., Inst. Eurecom, Sophia Antipolis, France

Volume

2

fYear

1996

fDate

3-6 Oct 1996

Firstpage

586

Abstract

Multimedia databases contain an increasing number of videos that are not easily semantically accessed. Among the useful indices that can be extracted from the soundtrack, the presence of a keyword at some place plays a prominent role. This paper deals with the specificities of such a keyword spotter and the enhancements brought to our previous technique (1996) based on frame labeling. To be useful, such a keyword spotter has to be speaker-independent. Moreover, it has to be able to detect any word from an open vocabulary. This directly implies the use of a phonemic representation of the word. These constraints usually lead to an excessively time-consuming tool. The division of the indexing process into two parts-the first one off-line, the second one at query time-allows a faster response

Keywords

audio coding; audio recording; audio systems; indexing; multimedia computing; speech coding; speech recognition; video recording; visual databases; vocabulary; frame labeling; multimedia databases; off-line process; open vocabulary; phonemic representation; query time; response speed; semantic access; speaker-independent keyword spotting; video soundtrack indexing; word detection; Hidden Markov models; Indexing; Labeling; Lattices; Loudspeakers; Multimedia communication; Multimedia databases; Speech; Viterbi algorithm; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607429

Filename

607429