• DocumentCode
    2289071
  • Title

    Language processing for Chinese speech recognition

  • Author

    Huang, Tingwen ; Jiang, Yizhang

  • Author_Institution
    Inst. of Autom., Acad. Sinica, Beijing
  • fYear
    1994
  • fDate
    13-16 Apr 1994
  • Firstpage
    151
  • Abstract
    A language processing method based on a statistical model is proposed and studied. This method is different from conventional bigram model, all the words in the vocabulary are mapped into several equivalence classes, according to the collocation between adjacent words. This modified bigram approach retains the simplicity and effectiveness of the bigram model, and also has the advantage of reducing the requirement of memory size to make this approach realizable on a PC computer. It also can moderate the problem of zero probabilities. All the parameters of the model are estimated from a corpus of 1.6 million words, covering a lexicon of 30,000 words. The training of the modified model is automatically realized with an unsupervised learning procedure. Several tests for decoding Chinese syllable strings to text have been carried out. The test results show that the average words correct rate is 87%. For news reports, a high word correct rate 96% is reached based on this modified bigram model
  • Keywords
    linguistics; natural languages; speech recognition; statistical analysis; unsupervised learning; Chinese speech recognition; Chinese syllable strings; adjacent word; average words correct rate; collocation; decoding; equivalence classes; language processing method; lexicon; memory size; modified bigram approach; statistical model; training; unsupervised learning; vocabulary; zero probabilities; Acoustic testing; Business communication; Character recognition; Materials testing; Natural languages; Parameter estimation; Phase estimation; Probability; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech, Image Processing and Neural Networks, 1994. Proceedings, ISSIPNN '94., 1994 International Symposium on
  • Print_ISBN
    0-7803-1865-X
  • Type

    conf

  • DOI
    10.1109/SIPNN.1994.344944
  • Filename
    344944