• DocumentCode
    730681
  • Title

    On using heterogeneous data for vehicle-based speech recognition: A DNN-based approach

  • Author

    Xue Feng ; Richardson, Brigitte ; Amman, Scott ; Glass, James

  • Author_Institution
    MIT Comput. Sci. & Artificial Intell. Lab., Cambridge, MA, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4385
  • Lastpage
    4389
  • Abstract
    Most automatic speech recognition (ASR) systems incorporate a single source of information about their input, namely, features and transformations derived from the speech signal. However, in many applications, e.g., vehicle-based speech recognition, sensor data and environmental information are often available to complement audio information. In this paper, we show how these data can be used to improve hybrid DNN-HMM ASR systems for a vehicle-based speech recognition task. Feature fusion is accomplished by augmenting acoustic features with additional side information before being presented to the DNN acoustic model. The additional features are extracted from the vehicle speed, HVAC status, windshield wiper status, and vehicle type. This supplementary information improves the DNNs ability to discriminate phonetic events in an environment-aware way without having to make any modification to the DNN training algorithms. Experimental results show that heterogeneous data are effective irrespective of whether cross-entropy or sequence training is used. For CE training, a WER reduction of 6.3% is obtained, while sequential training reduces it by 5.5%.
  • Keywords
    entropy; learning (artificial intelligence); neural nets; speech processing; speech recognition; vehicles; DNN acoustic model; DNN training algorithm; DNN-HMM ASR system; HVAC status; WER reduction; acoustic feature augmentation; automatic speech recognition; cross-entropy; deep neural network; feature fusion; heterogeneous data; phonetic event discrimination; sensor data; sequence training; speech signal; vehicle speed; vehicle type; vehicle-based speech recognition; windshield wiper status; Computational modeling; Hidden Markov models; Mel frequency cepstral coefficient; Robustness; Speech; Vehicles; Additional Feature for ASR; Condition-aware DNN; Deep Neural Network; Noise Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178799
  • Filename
    7178799