Author_Institution :
University of Tokyo, Bunkyo-ku, Tokyo, Japan
Abstract :
SpectraI analysis of the Japanese vowels shows that the five vowels /a/, /e/, /i/, /o/, and /u/ of a single speaker can well be separated by their first and second formant frequencies (F1and F2). Considerable amount of overlap is observed, however, when vowels of many speakers are plotted in the F1-F2plane, which can be ascribed mainly to differences in the size and shape of the vocal tract. A normalizing process, based presumably on higher formant frequencies, is expected in the identification of these vowels. It is not dear, however, whether concurrent changes of pitch and higher formants are necessary in the normalization process. This paper presents a method for evaluating the roles of these parameters and describes the results obtained. Perceptual boundaries between a pair of vowels, which share approximately the same ratio of F2to F1, are defined in the F1F2plane, using synthetic vowels generated by a terminal analog synthesizer. The importance of pitch and higher formants, is then evaluated by the extent to which their changes affect these boundaries. The results of listening tests show that, for ordinary buzz-excited vowels, neither pitch nor higher formants alone are sufficient for perceptual normalization, and the combined changes in pitch and higher formants are necessary to counteract the changes in F1and F2. For noise-excited vowels, on the other hand, the roles of higher formants are as important as the combined roles of pitch and higher formants in buzz-excited vowels.