DocumentCode
2289071
Title
Language processing for Chinese speech recognition
Author
Huang, Tingwen ; Jiang, Yizhang
Author_Institution
Inst. of Autom., Acad. Sinica, Beijing
fYear
1994
fDate
13-16 Apr 1994
Firstpage
151
Abstract
A language processing method based on a statistical model is proposed and studied. This method is different from conventional bigram model, all the words in the vocabulary are mapped into several equivalence classes, according to the collocation between adjacent words. This modified bigram approach retains the simplicity and effectiveness of the bigram model, and also has the advantage of reducing the requirement of memory size to make this approach realizable on a PC computer. It also can moderate the problem of zero probabilities. All the parameters of the model are estimated from a corpus of 1.6 million words, covering a lexicon of 30,000 words. The training of the modified model is automatically realized with an unsupervised learning procedure. Several tests for decoding Chinese syllable strings to text have been carried out. The test results show that the average words correct rate is 87%. For news reports, a high word correct rate 96% is reached based on this modified bigram model
Keywords
linguistics; natural languages; speech recognition; statistical analysis; unsupervised learning; Chinese speech recognition; Chinese syllable strings; adjacent word; average words correct rate; collocation; decoding; equivalence classes; language processing method; lexicon; memory size; modified bigram approach; statistical model; training; unsupervised learning; vocabulary; zero probabilities; Acoustic testing; Business communication; Character recognition; Materials testing; Natural languages; Parameter estimation; Phase estimation; Probability; Speech recognition; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Speech, Image Processing and Neural Networks, 1994. Proceedings, ISSIPNN '94., 1994 International Symposium on
Print_ISBN
0-7803-1865-X
Type
conf
DOI
10.1109/SIPNN.1994.344944
Filename
344944
Link To Document