Title :
Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition
Author :
Xinhui Hu ; Saiko, Masahiro ; Hori, Chiori
Author_Institution :
Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
Abstract :
Tone plays an important role in distinguishing lexical meaning in tonal languages, such as Mandarin and Thai. It has been revealed that tone information is helpful to improve automatic speech recognition (ASR) for these languages. In this study, we incorporate tone features from the fundamental frequency (Fo) and fundamental frequency variation (FFV) to the convolutional neural network (CNN), a state-of-the-art acoustic modeling approach, for acoustic modeling of the ASR systems. Due to its abilities of reducing spectral variations and modeling spectral correlations existing in speech signals, the CNN is expected to model well tone patterns which mainly behave in the frequency domain, by Fo contur. We conduct speech ASR experiments on Mandarin and Thai to evaluate the effectivenesses of the proposed approaches. With the help of tone features, the character error rates (CERs) of Mandarin achieve 4.3-7.1% relative reductions, and the word error rates (WERs) of Thai achieve 0.41-6.26% relative reductions. The CNN shows its clear superiority to the deep neural network (DNN), with relative CER reductions of 5.4-13.1% for Mandarin, and relative WER reductions of 0.5-5.6% for Thai.
Keywords :
natural language processing; neural nets; spectral analysis; speech recognition; ASR improvement; CER; CNN; FFV; Mandarin speech recognition improvement; Thai speech recognition improvement; acoustic modeling approach; automatic speech recognition improvement; character error rates; convolutional neural network; effectivenesses evaluation; frequency domain; fundamental frequency variation; lexical meaning; relative WER reductions; spectral correlation modeling; spectral variation reduction; speech signals; tonal languages; tone features; tone information; word error rates; Decision support systems; Radio frequency; Rail to rail inputs;
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
DOI :
10.1109/APSIPA.2014.7041576