On using heterogeneous data for vehicle-based speech recognition: A DNN-based approach

Author

Xue Feng ; Richardson, Brigitte ; Amman, Scott ; Glass, James

Author_Institution

MIT Comput. Sci. & Artificial Intell. Lab., Cambridge, MA, USA

fYear

2015

fDate

19-24 April 2015

Firstpage

4385

Lastpage

4389

Abstract

Most automatic speech recognition (ASR) systems incorporate a single source of information about their input, namely, features and transformations derived from the speech signal. However, in many applications, e.g., vehicle-based speech recognition, sensor data and environmental information are often available to complement audio information. In this paper, we show how these data can be used to improve hybrid DNN-HMM ASR systems for a vehicle-based speech recognition task. Feature fusion is accomplished by augmenting acoustic features with additional side information before being presented to the DNN acoustic model. The additional features are extracted from the vehicle speed, HVAC status, windshield wiper status, and vehicle type. This supplementary information improves the DNNs ability to discriminate phonetic events in an environment-aware way without having to make any modification to the DNN training algorithms. Experimental results show that heterogeneous data are effective irrespective of whether cross-entropy or sequence training is used. For CE training, a WER reduction of 6.3% is obtained, while sequential training reduces it by 5.5%.

Keywords

entropy; learning (artificial intelligence); neural nets; speech processing; speech recognition; vehicles; DNN acoustic model; DNN training algorithm; DNN-HMM ASR system; HVAC status; WER reduction; acoustic feature augmentation; automatic speech recognition; cross-entropy; deep neural network; feature fusion; heterogeneous data; phonetic event discrimination; sensor data; sequence training; speech signal; vehicle speed; vehicle type; vehicle-based speech recognition; windshield wiper status; Computational modeling; Hidden Markov models; Mel frequency cepstral coefficient; Robustness; Speech; Vehicles; Additional Feature for ASR; Condition-aware DNN; Deep Neural Network; Noise Robustness;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178799

Filename

7178799