Title :
Multiple time-span feature fusion for deep neural network modeling
Author :
Chongjia Ni ; Chen, Nancy F. ; Bin Ma
Author_Institution :
Inst. for Infocomm Res., A*STAR, Singapore, Singapore
Abstract :
In this paper, we exploit long term information from multiple time-spans for automatic speech recognition. The multiple time-span information is encoded into three different feature streams: speaker-adaptation-transformed features, deep bottleneck features and deep hierarchical bottleneck features. By combining three different time-spans in discriminative acoustic modeling, the character/syllable error rate improves for Mandarin and Vietnamese conversational telephone speech recognition. We obtain 0.8% and 1.9% absolute over DNN-HMM baselines in character error rate and syllable error rate for Mandarin and Vietnamese, respectively. Further analysis also suggests that our proposed feature fusion approach is able to encode finer-grain temporal information than directly using input features of long time-spans in DNN-HMM baselines.
Keywords :
acoustic signal processing; error statistics; feature extraction; hidden Markov models; natural language processing; neural nets; sensor fusion; speech recognition; DNN-HMM baselines; Mandarin language; Vietnamese language; automatic speech recognition; character error rate; conversational telephone speech recognition; deep hierarchical bottleneck features; deep neural network modeling; discriminative acoustic modeling; feature streams; hidden Markov model; long term information; multiple time-span feature fusion; multiple time-span information; speaker-adaptation-transformed features; syllable error rate; Acoustics; Feature extraction; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Feature representation; Hidden Markov model (HMM); deep bottleneck; deep hierarchical bottleneck; deep neural network (DNN);
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936707