DocumentCode
3611955
Title
Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech
Author
Dubey, Rajesh Kumar ; Kumar, Arun
Author_Institution
Center for Appl. Res. in Electron., Indian Inst. of Technol.-Delhi, New Delhi, India
Volume
9
Issue
9
fYear
2015
Firstpage
638
Lastpage
646
Abstract
A multi-resolution framework using auditory perception-based wavelet packet transform is invoked in multi-resolution auditory model (MRAM) and used for non-intrusive objective speech quality estimation. The MRAM provides a detailed time-frequency modelling of the human auditory system compared to earlier models that have been used for non-intrusive speech quality estimation. The objective Mean Opinion Score (MOS) of a degraded narrowband speech utterance has been estimated by Gaussian Mixture Model (GMM) probabilistic approach using MRAM-based feature vector. Additionally, a recent auditory model (Lyons´ auditory model) based features, mel-frequency cepstral coefficients (MFCC), and line spectral frequencies (LSF) features have also been used independently for comparison of the performance of MRAM features. The combination of MFCC and LSF features with MRAM features for non-intrusive speech quality estimation using GMM probabilistic approach has been proposed and investigated. The performance of these feature vectors has been evaluated and compared with ITU-T Recommendation P.563 and a recent published work by computing correlation coefficient and root-mean-square error between the subjective MOS and the estimated objective MOS. It is found that the proposed method that uses a combination of MRAM features, MFCC, and LSF feature vectors for non-intrusive speech quality performs better than both the other algorithms.
Keywords
Gaussian processes; cepstral analysis; feature extraction; mixture models; probability; speech processing; wavelet transforms; GMM probabilistic approach; Gaussian mixture model probabilistic approach; Lyons auditory model; MFCC features; MRAM-based feature vector; auditory perception-based wavelet packet transform; degraded narrowband speech utterance; human auditory system; line spectral frequencies features; mel-frequency cepstral coefficients; multiresolution auditory model features; nonintrusive objective speech quality estimation; nonintrusive speech quality assessment; objective mean opinion score; time-frequency modelling;
fLanguage
English
Journal_Title
Signal Processing, IET
Publisher
iet
ISSN
1751-9675
Type
jour
DOI
10.1049/iet-spr.2014.0214
Filename
7348909
Link To Document