• DocumentCode
    1281512
  • Title

    Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors

  • Author

    Kumatani, Kenichi ; McDonough, John ; Raj, Bhiksha

  • Author_Institution
    Disney Res., Pittsburgh, PA, USA
  • Volume
    29
  • Issue
    6
  • fYear
    2012
  • Firstpage
    127
  • Lastpage
    140
  • Abstract
    Distant speech recognition (DSR) holds the promise of the most natural human computer interface because it enables man-machine interactions through speech, without the necessity of donning intrusive body- or head-mounted microphones. Recognizing distant speech robustly, however, remains a challenge. This contribution provides a tutorial overview of DSR systems based on microphone arrays. In particular, we present recent work on acoustic beam forming for DSR, along with experimental results verifying the effectiveness of the various algorithms described here; beginning from a word error rate (WER) of 14.3% with a single microphone of a linear array, our state-of-the-art DSR system achieved a WER of 5.3%, which was comparable to that of 4.2% obtained with a lapel microphone. Moreover, we present an emerging technology in the area of far-field audio and speech processing based on spherical microphone arrays. Performance comparisons of spherical and linear arrays reveal that a spherical array with a diameter of 8.4 cm can provide recognition accuracy comparable or better than that obtained with a large linear array with an aperture length of 126 cm.
  • Keywords
    acoustic signal processing; array signal processing; human computer interaction; microphone arrays; speech recognition; DSR systems; WER; acoustic beamforming; body-mounted microphones; close-talking microphones; distant speech recognition; far-field audio processing; far-field sensors; head-mounted microphones; human computer interface; lapel microphone; linear array; man-machine interactions; microphone array processing; size 126 cm; size 8.4 cm; speech processing; spherical microphone arrays; word error rate; Array signal processing; Automatic speech recognition; Microphones; Speech recognition; Tutorials;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    1053-5888
  • Type

    jour

  • DOI
    10.1109/MSP.2012.2205285
  • Filename
    6296525