• DocumentCode
    3585016
  • Title

    Vocal tract length normalisation approaches to DNN-based children´s and adults´ speech recognition

  • Author

    Serizel, Romain ; Giuliani, Diego

  • Author_Institution
    HLT Res. Unit, Fondazione Bruno Kessler, Trento, Italy
  • fYear
    2014
  • Firstpage
    135
  • Lastpage
    140
  • Abstract
    This paper introduces approaches based on vocal tract length normalisation (VTLN) techniques for hybrid deep neural network (DNN) - hidden Markov model (HMM) automatic speech recognition when targeting children´s and adults´ speech. VTLN is investigated by training a DNN-HMM system by using first mel frequency cepstral coefficients (MFCCs) normalised with standard VTLN. Then, MFCCs derived acoustic features are combined with the VTLN warping factors to obtain an augmented set of features as input to a DNN. In this later, novel, approach the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when standard VTLN approach requires two decoding passes. Both VTLN-based approaches are shown to improve phone error rate performance, up to 20% relative improvement, compared to a baseline trained on a mixture of children´s and adults´ speech.
  • Keywords
    cepstral analysis; neural nets; speech recognition; DNN-HMM automatic speech recognition; DNN-HMM system training; DNN-based adult speech recognition; DNN-based children speech recognition; VTLN techniques; VTLN warping factors; acoustic features; augmented feature set; children-adult speech mixture; decoding; hidden Markov model; hybrid deep neural network; mel frequency cepstral coefficients; normalised MFCC; phone error rate performance improvement; relative improvement; vocal tract length normalisation approach; vocal tract length normalisation techniques; Abstracts; Context; Hidden Markov models; Mel frequency cepstral coefficient; Vocal tract length normalisation; automatic speech recognition; children´s speech recognition; deep neural networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2014 IEEE
  • Type

    conf

  • DOI
    10.1109/SLT.2014.7078563
  • Filename
    7078563