Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech

Author

Dubey, Rajesh Kumar ; Kumar, Arun

Author_Institution

Center for Appl. Res. in Electron., Indian Inst. of Technol.-Delhi, New Delhi, India

Volume

9

Issue

9

fYear

2015

Firstpage

638

Lastpage

646

Abstract

A multi-resolution framework using auditory perception-based wavelet packet transform is invoked in multi-resolution auditory model (MRAM) and used for non-intrusive objective speech quality estimation. The MRAM provides a detailed time-frequency modelling of the human auditory system compared to earlier models that have been used for non-intrusive speech quality estimation. The objective Mean Opinion Score (MOS) of a degraded narrowband speech utterance has been estimated by Gaussian Mixture Model (GMM) probabilistic approach using MRAM-based feature vector. Additionally, a recent auditory model (Lyons´ auditory model) based features, mel-frequency cepstral coefficients (MFCC), and line spectral frequencies (LSF) features have also been used independently for comparison of the performance of MRAM features. The combination of MFCC and LSF features with MRAM features for non-intrusive speech quality estimation using GMM probabilistic approach has been proposed and investigated. The performance of these feature vectors has been evaluated and compared with ITU-T Recommendation P.563 and a recent published work by computing correlation coefficient and root-mean-square error between the subjective MOS and the estimated objective MOS. It is found that the proposed method that uses a combination of MRAM features, MFCC, and LSF feature vectors for non-intrusive speech quality performs better than both the other algorithms.

Keywords

Gaussian processes; cepstral analysis; feature extraction; mixture models; probability; speech processing; wavelet transforms; GMM probabilistic approach; Gaussian mixture model probabilistic approach; Lyons auditory model; MFCC features; MRAM-based feature vector; auditory perception-based wavelet packet transform; degraded narrowband speech utterance; human auditory system; line spectral frequencies features; mel-frequency cepstral coefficients; multiresolution auditory model features; nonintrusive objective speech quality estimation; nonintrusive speech quality assessment; objective mean opinion score; time-frequency modelling;

fLanguage

English

Journal_Title

Signal Processing, IET

Publisher

iet

ISSN

1751-9675

Type

jour

DOI

10.1049/iet-spr.2014.0214

Filename

7348909