Title :
Broad phonetic class segmentation study for Thai automatic speech recognition
Author :
Rochkittichareon, Wittaya ; Suchato, Atiwong ; Punyabukkana, Proadpran
Author_Institution :
Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
Abstract :
An automatic broad class segmentation is an important pre-processing step in speech recognition and other speech applications, for example, the speech transcription task to support the phonetic transcription of speech corpus and pronunciation error detection of phone boundaries in language learning applications. This research is aimed at the improvement of the acoustic parameters for the Thai automatic speech recognition system. We proposed acoustic parameters that capture the characteristics of broad manner class of Thai speech. These acoustic parameters are: 1) spectral center of gravity and short time zero crossing rate to classify the silence feature and the continuant feature; and 2) the energy ratio E[0-400] to E[400-6000] to classify the syllabic feature. The results showed 28.09%, 11.0% and 2.41% error reductions for the continuant, the syllabic and the silence features, respectively, when compared to acoustic parameters used in English. The accuracy of 80.46% was obtained from the speech segmentation task and also introduced a 23.46% error reduction when compared to the baseline HMM-MFCC based broad class segmentation. We also found similar performance for word classification in the CVC context when compared to the baseline HMM-MFCC in word recognition tasks.
Keywords :
speech processing; speech recognition; CVC context; English; HMM-MFCC-based broad class segmentation; Thai automatic speech recognition; acoustic parameters; automatic broad class segmentation; broad phonetic class segmentation study; continuant feature classification; energy ratio; error reductions; gravity spectral center; language learning application; phone boundaries; phonetic transcription; pronunciation error detection; short-time zero-crossing rate; silence feature classification; speech corpus; speech transcription; syllabic feature classification; word classification; word recognition; Acoustics; Frequency measurement; Hidden Markov models; Probabilistic logic; Speech; Speech recognition; Support vector machines; acoustic parameters; phonetic feature; speech recognition; speech segmentation;
Conference_Titel :
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2012 9th International Conference on
Conference_Location :
Phetchaburi
Print_ISBN :
978-1-4673-2026-9
DOI :
10.1109/ECTICon.2012.6254262