Title :
Pitch tracking and tone features for Mandarin speech recognition
Author :
Huang, Hank Chang-Han ; Seide, Frank
Author_Institution :
Philips Innovation Center, Taipei, Taiwan
Abstract :
Tone modeling is a critical component for Mandarin large-vocabulary continuous-speech recognition systems. This paper presents an efficient real-time pitch tracker and a set of tone features that achieve a vast 30% reduction of the character error rate (CER), compared to the non-tonal baseline. To our knowledge, this is the highest improvement from tones ever reported for Mandarin. The paper first discusses adapting a known pitch-tracking algorithm for real-time operation. Second, we study the derivation of tone features for Mandarin LVCSR. Compared to the baseline vector (F0, ΔF0 ), our best tone features lead to a 28% reduction of tone errors. Results are shown for three LVCSR databases, including the Chinese 1998 National Performance Assessment (Project 863) and the Taiwan telephony database “MAT.” Performance of Western-language systems is reached, and for the “863 System Performance Test,” our system achieves 1.5% CER
Keywords :
error statistics; feature extraction; frequency estimation; natural languages; speech recognition; tracking; 863 System Performance Test; Chinese 1998 National Performance Assessment (Project 863); LVCSR databases; MAT; Mandarin large-vocabulary continuous-speech recognition systems; Mandarin speech recognition; Taiwan telephony database; character error rate; pitch tracking; tone features; Acoustic measurements; Dynamic programming; Error analysis; Natural languages; Spatial databases; Speech recognition; System testing; Technological innovation; Telephony; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.861942