• DocumentCode
    719599
  • Title

    Comparison of performance of the features of speech signal for non-intrusive speech quality assessment

  • Author

    Parmar, Nidhi ; Dubey, Rajesh Kumar

  • Author_Institution
    Electron. & Commun. Dept., Jaypee Inst. of Inf. Technol., Noida, India
  • fYear
    2015
  • fDate
    16-18 March 2015
  • Firstpage
    243
  • Lastpage
    248
  • Abstract
    In this work, two different features of speech signal for non-intrusive speech quality assessment has been compared. One based on mel-frequency cepstral coefficients (MFCC) and other based on reconstructed phase spaces (RPS), both are frequently in use for speech recognition system. Focus of the work is to compare the performance of RPS based features with the performance of MFCC based features for non-intrusive speech quality evaluation. MFCC is a close approximation of human auditory system and used in speech quality evaluation. The use of features based on RPS captures the true dynamics of the original speech signal and existence of non-linear characteristics of human speech production system. It replaces the conventionally used power spectrum estimation (PSE) method like FFT by two dimensional DFT methods. The Gaussian Mixture Model (GMM) is used for the mapping of these features to the mean opinion score (MOS). The evaluation of these features has been done for ITU-T Supplement-23 database and the comparison of performance for both of the features has been done in terms of correlation co-efficient between the subjective MOS and the objective MOS.
  • Keywords
    Gaussian processes; cepstral analysis; discrete Fourier transforms; fast Fourier transforms; mixture models; speech recognition; 2D DFT methods; FFT; Gaussian mixture model; ITU-T Supplement-23 database; MFCC based features; MOS; PSE method; RPS based features; discrete Fourier transform; fast Fourier transform; human auditory system; human speech production system; mean opinion score; mel-frequency cepstral coefficients; nonintrusive speech quality assessment; power spectrum estimation method; reconstructed phase spaces; speech quality evaluation; speech recognition system; speech signal features; Auditory system; Correlation coefficient; Databases; Feature extraction; Mel frequency cepstral coefficient; Quality assessment; Speech; Expectation Maximization; Gaussian Mixture Model; Non-intrusive Speech Quality; Objective MOS; Phase Spase Estimation; Reconstructed Phase Spaces; Subjective MOS;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communication (ICSC), 2015 International Conference on
  • Conference_Location
    Noida
  • Print_ISBN
    978-1-4799-6760-5
  • Type

    conf

  • DOI
    10.1109/ICSPCom.2015.7150655
  • Filename
    7150655