• DocumentCode
    1090256
  • Title

    On creating reference templates for speaker independent recognition of isolated words

  • Author

    Rabiner, Lawrence R.

  • Author_Institution
    Bell Laboratories, Murray Hill, NJ
  • Volume
    26
  • Issue
    1
  • fYear
    1978
  • fDate
    2/1/1978 12:00:00 AM
  • Firstpage
    34
  • Lastpage
    42
  • Abstract
    The three aspects of a statistical approach to a pattern recognition problem are the selection of features, choice of a measure of similarity, and a method for creating the reference templates (patterns) used in the statistical tests. This paper discusses a philosophy for creating reference templates for a speaker independent, isolated word recognition system. Although there remain many unanswered questions both about how to select appropriate features for recognition, and how to measure similarity between sets of features, such issues are not discussed here. Instead we concentrate on methods for creating the reference templates. In particular, a method of combining word patterns from a number of speakers is proposed in which a clustering type of analysis is used to determine which patterns are merged to create a word template. The creation of multiple templates, based on this method, is discussed and is shown to be of substantial value for as few as eight speakers in the training set. To test the ideas proposed here, a 54 word vocabulary word recognition system was implemented. All input words were recorded off a standard telephone line. The features used were the LPC coefficients of an 8-pole analysis, and the simple Itakura distance measure was used to measure similarity between patterns. With word templates obtained as described above, recognition accuracies of 85 percent were obtained in a forced choice recognition test on the 54 word vocabulary using eight new speakers. The correct word was within the top five choices 98 percent of the time. Using a strategy in which all the training words were used to create the templates, the recognition accuracy fell to 77 percent, and the correct word was within the top five choices only 89 percent of the time.
  • Keywords
    Band pass filters; Cepstral analysis; Energy measurement; Filtering theory; Frequency domain analysis; Frequency measurement; Linear predictive coding; Pattern recognition; Speech; Time measurement;
  • fLanguage
    English
  • Journal_Title
    Acoustics, Speech and Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0096-3518
  • Type

    jour

  • DOI
    10.1109/TASSP.1978.1163037
  • Filename
    1163037