• DocumentCode
    3428930
  • Title

    Convolutional Neural Networks-based continuous speech recognition using raw speech signal

  • Author

    Palaz, Dimitri ; Magimai-Doss, Mathew ; Collobert, Ronan

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4295
  • Lastpage
    4299
  • Abstract
    State-of-the-art automatic speech recognition systems model the relationship between acoustic speech signal and phone classes in two stages, namely, extraction of spectral-based features based on prior knowledge followed by training of acoustic model, typically an artificial neural network (ANN). In our recent work, it was shown that Convolutional Neural Networks (CNNs) can model phone classes from raw acoustic speech signal, reaching performance on par with other existing feature-based approaches. This paper extends the CNN-based approach to large vocabulary speech recognition task. More precisely, we compare the CNN-based approach against the conventional ANN-based approach on Wall Street Journal corpus. Our studies show that the CNN-based approach achieves better performance than the conventional ANN-based approach with as many parameters. We also show that the features learned from raw speech by the CNN-based approach could generalize across different databases.
  • Keywords
    acoustic signal processing; feature extraction; learning (artificial intelligence); neural nets; speech recognition; ANN approach; CNN approach; Wall Street Journal corpus; acoustic model training; artificial neural network; continuous speech recognition; convolutional neural network; feature learning; large vocabulary speech recognition task; phone classes; prior knowledge; raw acoustic speech signal; spectral-based feature extraction; state-of-the-art automatic speech recognition system model; Acoustics; Convolution; Feature extraction; Hidden Markov models; Neural networks; Speech; Speech recognition; automatic speech recognition; convolutional neural networks; feature learning; raw signal;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178781
  • Filename
    7178781