DocumentCode :
67872
Title :
The Inherent Temporal Precision of Phoneme Transitions
Author :
Baghai-Ravary, L.
Author_Institution :
Phonetics Lab., Univ. of Oxford, Oxford, UK
Volume :
21
Issue :
3
fYear :
2013
fDate :
Mar-13
Firstpage :
579
Lastpage :
586
Abstract :
In natural speech, some phoneme transitions correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the acoustic transition from one phoneme to the next is gradual. In this paper we determine the naturally occurring groups of phonemes (regardless of conventional phonetic categories) which show similar characteristics in such behavior. These data-driven groupings could be used in the design of decision-trees for context-dependent phoneme clustering, as used in large-vocabulary speech recognition and alignment systems, or during the design of speech databases for speech synthesis systems. We use 128 different Hidden Markov Model phoneme alignment systems and a large corpus of British English speech to assess the consistency with which different phoneme transitions can be identified. The phoneme transitions are grouped automatically so as to minimize the statistical differences in behavior between members of each group. In this way we derive two sets of phonemic classes, one for the first phoneme of each phoneme-to-phoneme transition, and another for the second. The grouping of the phonemes confirms that broad phonetic classes are a significant indicator of the accuracy with which boundaries can be identified, but there are a number of exceptions and some apparent sub-divisions and mergers of accepted phonetic classes. The automatic grouping of the second phonemes results in two singletons, /Z/ and /N/ (in SAMPA notation). Finally, statistics are presented which characterize the precision with which transitions between these automatic classes can be identified. These could provide weightings to be applied to different transitions to provide a more realistic assessment when evaluating the relative accuracies of different alignment systems.
Keywords :
decision trees; hidden Markov models; speech processing; speech recognition; Hidden Markov Model; acoustic signal; acoustic transition; automatic grouping; context-dependent phoneme clustering; data-driven groupings; decision-trees; inherent temporal precision; large-vocabulary speech recognition; natural speech; naturally occurring groups; phoneme transitions; phoneme-to-phoneme transition; phonetic categories; Accuracy; Acoustics; Databases; Hidden Markov models; Speech; Speech recognition; Vocabulary; Broad phonetic classes; phoneme alignment; speech analysis; speech recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2227739
Filename :
6353543
Link To Document :
بازگشت