DocumentCode :
179342
Title :
Mandarin tone classification without pitch tracking
Author :
Ryant, Neville ; Jiahong Yuan ; Liberman, Mark
Author_Institution :
Linguistic Data Consortium, Univ. of Pennsylvania, Philadelphia, PA, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
4868
Lastpage :
4872
Abstract :
A deep neural network (DNN) based classifier achieved 27.38% frame error rate (FER) and 15.62% segment error rate (SER) in recognizing five tonal categories in Mandarin Chinese broadcast news, based on 40 mel-frequency cepstral coefficients (MFCCs). The same architecture scored substantially lower when trained and tested with F0 and amplitude parameters alone: 40.05% FER and 22.66% SER. These results are substantially better than the best previously-reported results on broadcast-news tone classification [1] and are also better than a human listener achieved in categorizing test stimuli created by amplitude- and frequency-modulating complex tones to match the extracted F0 and amplitude parameters.
Keywords :
amplitude modulation; frequency modulation; neural nets; pattern classification; speech recognition; DNN based classifier; FER; MFCC; Mandarin Chinese broadcast news; Mandarin tone classification; SER; amplitude-modulating complex tone; deep neural network based classifier; frame error rate; frequency-modulating complex tone; melfrequency cepstral coefficient; pitch tracking; segment error rate; speech recognition; tonal recognition; Context; Error analysis; Mel frequency cepstral coefficient; Neural networks; Speech; Speech recognition; Training; Mandarin; deep neural networks; speech recognition; tone modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854527
Filename :
6854527
Link To Document :
بازگشت