Fusion based speech segmentation in DARPA SPINE2 task

Author

Zheng, Chengyi ; Yan, Yonghong

Author_Institution

Comput. Sci. & Eng. Dept., Oregon Health & Sci. Univ., Beaverton, OR, USA

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

We report a new fusion based segmentation approach using multiple filter bank coefficients. This approach takes advantage of current feature extraction procedure, with little additional computation cost. Another level of fusion was performed by combining several segmentation systems. Evaluation was conducted on the second Speech In Noisy Environments (SPINE2) task. Experiments show our fusion based approaches significantly reduced the WER compared to two classifier-based approaches. Compared to the manual segmentation, our approach only has 0.3% WER increase.

Keywords

channel bank filters; error statistics; feature extraction; sensor fusion; speech recognition; ASR; DARPA SPINE2 task; Speech In Noisy Environments task; WER reduction; automatic speech recognition; feature extraction; fusion based segmentation; multiple filter bank coefficients; speech segmentation; Acoustic noise; Automatic speech recognition; Computer science; Decoding; Hidden Markov models; Loudspeakers; Military aircraft; Speech recognition; Streaming media; Working environment noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1326128

Filename

1326128