• DocumentCode
    134321
  • Title

    Multiple time-span feature fusion for deep neural network modeling

  • Author

    Chongjia Ni ; Chen, Nancy F. ; Bin Ma

  • Author_Institution
    Inst. for Infocomm Res., A*STAR, Singapore, Singapore
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    138
  • Lastpage
    142
  • Abstract
    In this paper, we exploit long term information from multiple time-spans for automatic speech recognition. The multiple time-span information is encoded into three different feature streams: speaker-adaptation-transformed features, deep bottleneck features and deep hierarchical bottleneck features. By combining three different time-spans in discriminative acoustic modeling, the character/syllable error rate improves for Mandarin and Vietnamese conversational telephone speech recognition. We obtain 0.8% and 1.9% absolute over DNN-HMM baselines in character error rate and syllable error rate for Mandarin and Vietnamese, respectively. Further analysis also suggests that our proposed feature fusion approach is able to encode finer-grain temporal information than directly using input features of long time-spans in DNN-HMM baselines.
  • Keywords
    acoustic signal processing; error statistics; feature extraction; hidden Markov models; natural language processing; neural nets; sensor fusion; speech recognition; DNN-HMM baselines; Mandarin language; Vietnamese language; automatic speech recognition; character error rate; conversational telephone speech recognition; deep hierarchical bottleneck features; deep neural network modeling; discriminative acoustic modeling; feature streams; hidden Markov model; long term information; multiple time-span feature fusion; multiple time-span information; speaker-adaptation-transformed features; syllable error rate; Acoustics; Feature extraction; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Feature representation; Hidden Markov model (HMM); deep bottleneck; deep hierarchical bottleneck; deep neural network (DNN);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936707
  • Filename
    6936707