Detection-based speech recognition with sparse point process models

Author

Jansen, Aren ; Niyogi, Partha

Author_Institution

HLT Center of Excellence, Johns Hopkins Univ., Baltimore, MD, USA

fYear

2010

fDate

14-19 March 2010

Firstpage

4362

Lastpage

4365

Abstract

We present a bottom-up approach to connected digit recognition in which (i) the speech signal is transformed into a sparse set of acoustic events in time, (ii) point process models (PPM) of these events are used to detect candidate digit occurrences, and (iii) the candidate digit detections are reduced to a single digit sequence prediction by using a previously proposed graph-based optimization. We find the performance of this detection-based system on the AURORA2 evaluation matches that of an HTK baseline in clean speech and provides improved robustness to non-stationary noise. A similar robustness to stationary noise sources is achieved with unsupervised PPM adaptation using small amounts of the noisy data.

Keywords

optimisation; speech recognition; AURORA2 evaluation matches; HTK baseline; acoustic events; connected digit recognition; detection-based speech recognition; detection-based system; graph-based optimization; noisy data; sparse point process models; speech signal; stationary noise sources; unsupervised PPM adaptation; Acoustic signal detection; Decoding; Detectors; Event detection; Hidden Markov models; Noise robustness; Predictive models; Speech processing; Speech recognition; Vocabulary; speech processing; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5495636

Filename

5495636