Author :
Hisagi, Mitsugu ; Saitoh, Takeshi ; Konishi, Ryosuke
Abstract :
Recently, speech, especially word recognition using visual information, has attracted significantly interest. In word recognition, the target is not just a word, but also a vowel. However, since vowel frames do not contain many phonemes, vowel recognition rate is less than that of the word. Some research has been done on vowel recognition. This paper analyses features that can effectively recognize five Japanese vowels. The process of our method is as follows. First, the sampled active contour model is applied to detect the lip contour during the input image sequences. The lip size is then normalized. Next, various features, such as the shape, diameter, and the approximate function features, are calculated and compared with the recognition rates using k nearest neighbor method. Two experiments, speaker dependent recognition and speaker independent recognition are carried out with five subjects. We calculated eleven feature sets and found that the feature set including the area and aspect ratio is the most effective. In the speaker dependent recognition, we obtained the recognition rate of 96.6%, and in the speaker independent recognition, we obtained 82.4%
Keywords :
image sequences; natural languages; speech recognition; Japanese vowel recognition; aspect ratio; image sequences; k nearest neighbor method; lip contour detection; sampled active contour model; speaker dependent recognition; speaker independent recognition; speech recognition; visual information; vowel recognition rate; word recognition; Active contours; Data mining; Image recognition; Image sequences; Lips; Nearest neighbor searches; Shape; Speech analysis; Speech recognition; Target recognition;