• DocumentCode
    3422846
  • Title

    Hierarchical and parallel processing of modulation spectrum for ASR applications

  • Author

    Valente, Fabio ; Hermansky, Hynek

  • Author_Institution
    IDIAP Res. Inst., Martigny
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4165
  • Lastpage
    4168
  • Abstract
    The modulation spectrum is an efficient representation for describing dynamic information in signals. In this work we investigate how to exploit different elements of the modulation spectrum for extraction of information in automatic recognition of speech (ASR). Parallel and hierarchical (sequential) approaches are investigated. Parallel processing combines outputs of independent classifiers applied to different modulation frequency channels. Hierarchical processing uses different modulation frequency channels sequentially. Experiments are run on a LVCSR task for meetings transcription and results are reported on the RT05 evaluation data. Processing modulation frequencies channels with different classifiers provides a consistent reduction in WER (2% absolute w.r.t. PLP baseline). Hierarchical processing outperforms parallel processing. The largest WER reduction is obtained through sequential processing moving from high to low modulation frequencies. This model is consistent with several perceptual and physiological studies on auditory processing.
  • Keywords
    speech recognition; ASR application; LVCSR task; auditory processing; automatic speech recognition; hierarchical processing; information extraction; meetings transcription; modulation spectrum; parallel processing; sequential processing; Automatic speech recognition; Band pass filters; Data mining; Filtering; Fourier transforms; Frequency modulation; Gabor filters; Neural networks; Parallel processing; Speech recognition; Hierarchical and parallel combination; LUCSR; Modulation spectrum; Multi-resolution filter; Neural Network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518572
  • Filename
    4518572