• DocumentCode
    3132020
  • Title

    Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition

  • Author

    Do, Cong-Thanh ; Taghizadeh, Mohammad J. ; Garner, Philip N.

  • Author_Institution
    LIMSI, Orsay, France
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    137
  • Lastpage
    142
  • Abstract
    This paper investigates the combination of cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. Testing speech signals are recorded by a circular microphone array and are subsequently processed with superdirective beamforming and McCowan post-filtering. Training speech signals, from the multichannel overlapping Number corpus (MONC), are clean and not overlapping. Cochlear implant-like speech processing, which is inspired from the speech processing strategy in cochlear implants, is applied on the training and testing speech signals. Cepstral normalization, including cepstral mean and variance normalization (CMN and CVN), are applied on the training and testing cepstra. Experiments show that implementing either cepstral normalization or cochlear implant-like speech processing helps in reducing the WERs of microphone array-based speech recognition. Combining cepstral normalization and cochlear implant-like speech processing reduces further the WERs, when there is overlapping speech. Train/test mismatches are measured using the Kullback-Leibler divergence (KLD), between the global probability density functions (PDFs) of training and testing cepstral vectors. This measure reveals a train/test mismatch reduction when either cepstral normalization or cochlear implant-like speech processing is used. It reveals also that combining these two processing reduces further the train/test mismatches as well as the WERs.
  • Keywords
    acoustic signal processing; array signal processing; cepstral analysis; cochlear implants; filtering theory; microphone arrays; probability; speech recognition; CMN; CVN; KLD; Kullback-Leibler divergence; MONC; McCowan postfiltering; PDF; WER; cepstral mean normalization; cepstral variance normalization; circular microphone array; cochlear implant-like speech processing; multichannel overlapping number corpus; overlapping speech; probability density function; speech recognition; speech signal recording; speech signal testing; speech signal training; superdirective beamforming; testing cepstral vector; training cepstral vector; Cepstral analysis; Microphones; Speech; Speech processing; Speech recognition; Testing; Training; Cepstral normalization; Cochlear implant-like speech processing; Kullback-Leibler divergence; Microphone array speech recognition; Overlapping speech;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424211
  • Filename
    6424211