Perceptual audio features for unsupervised key-phrase detection

Author

Von Zeddelmann, Dirk ; Kurth, Frank ; Müller, Meinard

Author_Institution

KOM Dept., Fraunhofer-FKIE, Wachtberg, Germany

fYear

2010

fDate

14-19 March 2010

Firstpage

257

Lastpage

260

Abstract

We propose a new type of audio feature (HFCC-ENS) as well as an unsupervised method for detecting short sequences of spoken words (key-phrases) within long speech recordings. Our technical contributions are threefold: Firstly, we propose to use bandwidth-adapted filterbanks instead of classical MFCC-style filters in the feature extraction step. Secondly, the time resolution of the resulting features is adapted to account for the temporal characteristics of the spoken phrases. Thirdly, the key-phrase detection step is performed by matching sequences of the resulting HFCC-ENS features with features extracted from a target speech recording. We evaluate the proposed method using the German Kiel Corpus and furthermore investigate speech-related properties of the proposed feature.

Keywords

cepstral analysis; feature extraction; speech recognition; German kiel corpus; MFCC style filters; feature extraction step; perceptual audio features; speech recordings; spoken words sequences; unsupervised key phrase detection; Audio recording; Bandwidth; Feature extraction; Filters; Frequency; Hidden Markov models; Humans; Robustness; Speech processing; Statistics; HFCC; Speech features; key-phrase detection; key-phrase spotting;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5495974

Filename

5495974