DocumentCode
1090256
Title
On creating reference templates for speaker independent recognition of isolated words
Author
Rabiner, Lawrence R.
Author_Institution
Bell Laboratories, Murray Hill, NJ
Volume
26
Issue
1
fYear
1978
fDate
2/1/1978 12:00:00 AM
Firstpage
34
Lastpage
42
Abstract
The three aspects of a statistical approach to a pattern recognition problem are the selection of features, choice of a measure of similarity, and a method for creating the reference templates (patterns) used in the statistical tests. This paper discusses a philosophy for creating reference templates for a speaker independent, isolated word recognition system. Although there remain many unanswered questions both about how to select appropriate features for recognition, and how to measure similarity between sets of features, such issues are not discussed here. Instead we concentrate on methods for creating the reference templates. In particular, a method of combining word patterns from a number of speakers is proposed in which a clustering type of analysis is used to determine which patterns are merged to create a word template. The creation of multiple templates, based on this method, is discussed and is shown to be of substantial value for as few as eight speakers in the training set. To test the ideas proposed here, a 54 word vocabulary word recognition system was implemented. All input words were recorded off a standard telephone line. The features used were the LPC coefficients of an 8-pole analysis, and the simple Itakura distance measure was used to measure similarity between patterns. With word templates obtained as described above, recognition accuracies of 85 percent were obtained in a forced choice recognition test on the 54 word vocabulary using eight new speakers. The correct word was within the top five choices 98 percent of the time. Using a strategy in which all the training words were used to create the templates, the recognition accuracy fell to 77 percent, and the correct word was within the top five choices only 89 percent of the time.
Keywords
Band pass filters; Cepstral analysis; Energy measurement; Filtering theory; Frequency domain analysis; Frequency measurement; Linear predictive coding; Pattern recognition; Speech; Time measurement;
fLanguage
English
Journal_Title
Acoustics, Speech and Signal Processing, IEEE Transactions on
Publisher
ieee
ISSN
0096-3518
Type
jour
DOI
10.1109/TASSP.1978.1163037
Filename
1163037
Link To Document