• DocumentCode
    1688626
  • Title

    Probabilistic asr feature extraction applying context-sensitive connectionist temporal classification networks

  • Author

    Wollmer, Martin ; Schuller, Bjorn ; Rigoll, Gerhard

  • Author_Institution
    BMW Group, Munich, Germany
  • fYear
    2013
  • Firstpage
    7125
  • Lastpage
    7129
  • Abstract
    This paper proposes a novel automatic speech recognition (ASR) front-end that unites the principles of bidirectional Long Short-Term Memory (BLSTM), Connectionist Temporal Classification (CTC), and Bottleneck (BN) feature generation. BLSTM networks are known to produce better probabilistic ASR features than conventional multilayer perceptrons since they are able to exploit a self-learned amount of temporal context for phoneme estimation. Combining BLSTM networks with a CTC output layer implies the advantage that the network can be trained on unsegmented data so that the quality of phoneme prediction does not rely on potentially error-prone forced alignment segmentations of the training set. In challenging ASR scenarios involving highly spontaneous, disfluent, and noisy speech, our BN-CTC front-end leads to remarkable word accuracy improvements and prevails over a series of previously introduced BLSTM-based ASR systems.
  • Keywords
    feature extraction; multilayer perceptrons; probability; signal classification; speech recognition; BLSTM; CTC; automatic speech recognition; bidirectional long short-term memory; bottleneck feature generation; connectionist temporal classification; feature extraction; forced alignment segmentations; multilayer perceptrons; noisy speech; phoneme estimation; phoneme prediction; probabilistic ASR; training set; word accuracy improvement; Accuracy; Feature extraction; Hidden Markov models; Probabilistic logic; Speech; Speech recognition; Training; Automatic Speech Recognition; Connectionist Temporal Classification; Long Short-Term Memory; Tandem Features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639045
  • Filename
    6639045