• DocumentCode
    677139
  • Title

    Non-intrusive objective speech quality assessment using a combination of MFCC, PLP and LSF features

  • Author

    Dubey, Rajesh Kumar ; Kumar, Ajit

  • Author_Institution
    Dept. of Electron. & Commun. Eng., Jaypee Inst. of Inf. Technol., Noida, India
  • fYear
    2013
  • fDate
    12-14 Dec. 2013
  • Firstpage
    297
  • Lastpage
    302
  • Abstract
    In non-intrusive speech quality assessment, original clean speech signal is not used as reference but only the received degraded speech is used for the quality estimation. The processing and perception of speech signals by human auditory system are captured in the perceptual linear prediction coefficients (PLP) and Mel frequency cepstral coefficients (MFCC) features. The line spectral frequencies (LSF) features carry intrinsic information of the formant structure of phoneme which is related to the resonance frequencies of the vocal tract of the speaker during articulation. The combination of PLP, MFCC and LSF features along with the subjective mean opinion score (MOS) of the speech utterances are used to train the joint Gaussian Mixture Model (GMM) by Expectation Maximization (EM) algorithm. The parameters of the joint GMM thus obtained and the combination of PLP, MFCC and LSF features are used to estimate the objective mean opinion score (MOS) of the speech utterances. The correlation of the subjective and the estimated objective MOS is obtained as figure of merit for the speech quality assessment algorithm. To show the efficacy of the method, the results in terms of correlation and root mean square error (RMSE) between the subjective and the estimated objective MOS are compared with ITU-T Recommendation P.563, standard for non-intrusive speech quality assessment on ITU-T supplement-23, NOIZEUS-960 and NOIZEUS-2240 databases.
  • Keywords
    Gaussian processes; cepstral analysis; mean square error methods; mixture models; quality of service; speech processing; EM algorithm; GMM; Gaussian mixture model; ITU-T Recommendation P.563 standard; ITU-T supplement-23; LSF feature; MFCC feature; MOS; NOIZEUS-2240 database; NOIZEUS-960 database; PLP feature; RMSE; expectation maximization algorithm; human auditory system; line spectral frequency feature; mean opinion score; melfrequency cepstral coefficient feature; nonintrusive speech quality assessment; perceptual linear prediction coefficient; quality estimation; resonance frequency; root mean square error; speech utterance; vocal tract; Databases; Mel frequency cepstral coefficient; Quality assessment; Speech; Speech coding; Speech processing; Vectors; Expectation maximization; Gaussian mixture model; Line spectral frequencies; Non-intrusive; Objective MOS; Perceptual linear prediction; Speech quality; Subjective MOS;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communication (ICSC), 2013 International Conference on
  • Conference_Location
    Noida
  • Print_ISBN
    978-1-4799-1605-4
  • Type

    conf

  • DOI
    10.1109/ICSPCom.2013.6719801
  • Filename
    6719801