Non-intrusive objective speech quality assessment using a combination of MFCC, PLP and LSF features

Author

Dubey, Rajesh Kumar ; Kumar, Ajit

Author_Institution

Dept. of Electron. & Commun. Eng., Jaypee Inst. of Inf. Technol., Noida, India

fYear

2013

fDate

12-14 Dec. 2013

Firstpage

297

Lastpage

302

Abstract

In non-intrusive speech quality assessment, original clean speech signal is not used as reference but only the received degraded speech is used for the quality estimation. The processing and perception of speech signals by human auditory system are captured in the perceptual linear prediction coefficients (PLP) and Mel frequency cepstral coefficients (MFCC) features. The line spectral frequencies (LSF) features carry intrinsic information of the formant structure of phoneme which is related to the resonance frequencies of the vocal tract of the speaker during articulation. The combination of PLP, MFCC and LSF features along with the subjective mean opinion score (MOS) of the speech utterances are used to train the joint Gaussian Mixture Model (GMM) by Expectation Maximization (EM) algorithm. The parameters of the joint GMM thus obtained and the combination of PLP, MFCC and LSF features are used to estimate the objective mean opinion score (MOS) of the speech utterances. The correlation of the subjective and the estimated objective MOS is obtained as figure of merit for the speech quality assessment algorithm. To show the efficacy of the method, the results in terms of correlation and root mean square error (RMSE) between the subjective and the estimated objective MOS are compared with ITU-T Recommendation P.563, standard for non-intrusive speech quality assessment on ITU-T supplement-23, NOIZEUS-960 and NOIZEUS-2240 databases.

Keywords

Gaussian processes; cepstral analysis; mean square error methods; mixture models; quality of service; speech processing; EM algorithm; GMM; Gaussian mixture model; ITU-T Recommendation P.563 standard; ITU-T supplement-23; LSF feature; MFCC feature; MOS; NOIZEUS-2240 database; NOIZEUS-960 database; PLP feature; RMSE; expectation maximization algorithm; human auditory system; line spectral frequency feature; mean opinion score; melfrequency cepstral coefficient feature; nonintrusive speech quality assessment; perceptual linear prediction coefficient; quality estimation; resonance frequency; root mean square error; speech utterance; vocal tract; Databases; Mel frequency cepstral coefficient; Quality assessment; Speech; Speech coding; Speech processing; Vectors; Expectation maximization; Gaussian mixture model; Line spectral frequencies; Non-intrusive; Objective MOS; Perceptual linear prediction; Speech quality; Subjective MOS;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing and Communication (ICSC), 2013 International Conference on

Conference_Location

Noida

Print_ISBN

978-1-4799-1605-4

Type

conf

DOI

10.1109/ICSPCom.2013.6719801

Filename

6719801