Recognition of Phonemes In a Continuous Speech Stream By Means of PARCOR Parameter In LPC Vocoder

Author

Cui, Ying ; Takaya, Kunio

Author_Institution

Univ. of Saskatchewan, Saskatoon

fYear

2007

fDate

22-26 April 2007

Firstpage

1606

Lastpage

1609

Abstract

Linear Predictive Coding (LPC) has been used to compress and encode speech signals for digital transmission at a low bit rate. LPC determines a FIR system that predicts a speech sample from the past samples by minimizing the squared error between the actual occurrence and the estimated. The coefficients of the FIR system are encoded and sent. At the receiving end, the inverse system called AR model is excited by a random signal to reproduce the encoded speech. The use of LPC can be extended to speech recognition since the FIR coefficients are the condensed information of a speech signal of typically 10ms -30ms. PARCOR parameter associated with LPC that represents a vocal tract model based on a lattice filter structure is considered for speech recognition. The use of FIR coefficients and the frequency response of AR model were previously investigated. [1] This paper reports the method to detect a limited number of phonemes from a continuous stream of speech. A system being developed slides a time window of 16 ms and calculates the PARCOR parameters continuously, feeding them to a classifier. A classifier is a supervised classifier that requires training. The classifier uses the Maximum Likelihood Decision Rule. The training uses TIMIT speech database, which contains the recordings of 630 speakers of 8 major dialects of American English. The classification results of some typical vowel and consonant phonemes segmented from the continuous speech are listed. The vowel and consonant correct classification rate are 65.22% and 93.51%. Overall, They indicate that the PARCOR parameters have the potential capability to characterize the phonemes.

Keywords

FIR filters; autoregressive processes; linear predictive coding; maximum likelihood decoding; speech coding; speech recognition; vocoders; AR model; American English; FIR system; LPC vocoder; PARCOR parameter; TIMIT speech database; continuous speech stream; frequency response; lattice filter; linear predictive coding; maximum likelihood decision rule; phonemes recognition; speech recognition; speech signal compression; speech signals; supervised classifier; vocal tract model; Bit rate; Finite impulse response filter; Frequency response; Lattices; Linear predictive coding; Maximum likelihood detection; Maximum likelihood estimation; Speech coding; Speech recognition; Vocoders;

fLanguage

English

Publisher

ieee

Conference_Titel

Electrical and Computer Engineering, 2007. CCECE 2007. Canadian Conference on

Conference_Location

Vancouver, BC

ISSN

0840-7789

Print_ISBN

1-4244-1020-7

Electronic_ISBN

0840-7789

Type

conf

DOI

10.1109/CCECE.2007.402

Filename

4233061