• DocumentCode
    117918
  • Title

    Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

  • Author

    Longbiao Wang ; Bo Ren ; Ueda, Yuma ; Kai, Atsuhiko ; Teraoka, Shunta ; Fukushima, Taku

  • Author_Institution
    Nagaoka Univ. of Technol., Nagaoka, Japan
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.
  • Keywords
    cepstral analysis; speech coding; speech recognition; asynchronous mobile terminals; asynchronous speech recording; automatic asynchronous speech; cepstral-domain dereverberation; denoising autoencoder; distant-talking speech recognition; environment adaptation; far-field multiple mobile terminals; large vocabulary continuous speech recognition; word error rate; Hidden Markov models; Mobile communication; Noise reduction; Reverberation; Speech; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
  • Conference_Location
    Siem Reap
  • Type

    conf

  • DOI
    10.1109/APSIPA.2014.7041548
  • Filename
    7041548