• DocumentCode
    971179
  • Title

    Isolated word recognition by neural network models with cross-correlation coefficients for speech dynamics

  • Author

    Wu, Jianxiong ; Chan, Chorkin

  • Author_Institution
    Dept. of Comput. Sci., Hong Kong Univ., Hong Kong
  • Volume
    15
  • Issue
    11
  • fYear
    1993
  • fDate
    11/1/1993 12:00:00 AM
  • Firstpage
    1174
  • Lastpage
    1185
  • Abstract
    This paper presents an artificial neural network (ANN) for speaker-independent isolated word speech recognition. The network consists of three subnets in concatenation. The static information within one frame of speech signal is processed in the probabilistic mapping subnet that converts an input vector of acoustic features into a probability vector whose components are estimated probabilities of the feature vector belonging to the phonetic classes that constitute the words in the vocabulary. The dynamics capturing subnet computes the first-order cross correlation between the components of the probability vectors to serve as the discriminative feature derived from the interframe temporal information of the speech signal. These dynamic features are passed for decision-making to the classification subnet, which is a multilayer perceptron (MLP). The architecture of these three subnets are described, and the associated adaptive learning algorithms are derived. The recognition results for a subset of the DARPA TIMIT speech database are reported. The correct recognition rate of the proposed ANN system is 95.5%, whereas that of the best of continuous hidden Markov model (HMM)-based systems is only 91.0%
  • Keywords
    correlation methods; decision theory; feedforward neural nets; learning (artificial intelligence); probability; speech recognition; DARPA TIMIT speech database; associated adaptive learning; classification subnet; concatenation; cross-correlation coefficients; decision-making; feature vector; hidden Markov model; interframe temporal information; multilayer perceptron; neural network models; probabilistic mapping subnet; speaker-independent isolated word speech recognition; speech dynamics; Artificial neural networks; Decision making; Hidden Markov models; Multilayer perceptrons; Neural networks; Signal mapping; Signal processing; Speech processing; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.244678
  • Filename
    244678