• DocumentCode
    667539
  • Title

    Sparse representation and epoch estimation of voiced speech

  • Author

    Gunther, Jacob ; Moon, Thomas

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Utah State Univ., Logan, UT, USA
  • fYear
    2013
  • fDate
    20-23 Oct. 2013
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Whereas most approaches to linear speech prediction fail to account for the quasi-periodic glottal flow, this paper incorporates a model for the glottal flow derivative (GFD) directly into the linear prediction problem. A linear model for the prediction error is obtained by constructing a dictionary of time-shifted GFD pulses. The pulses are constructed by applying glottal inverse filtering (GIF) to recorded speech. Minimizing the difference between the linear prediction residual and a sparse combination of the pulses in the dictionary leads to joint estimation of the linear predictor as well as a sparse representation for the prediction error that reveals the instants of vocal tract excitation (epochs). The method is applied to voiced segments extracted from the CMU Arctic dataset which also includes electro-glottograms. Results show that the proposed method is effective in estimating the parameters of interest and that GIF-based pulses more accurately model GFD pulses occurring in real speech than pulses computed using the mathematical models.
  • Keywords
    estimation theory; filtering theory; signal representation; speech processing; CMU Arctic dataset; GIF; electro-glottograms; epoch estimation; glottal flow derivative; glottal inverse filtering; linear prediction problem; linear prediction residual; linear predictor; linear speech prediction; prediction error; quasiperiodic glottal flow; sparse combination; sparse representation; time-shifted GFD pulses; vocal tract excitation; voiced segments; voiced speech; Acoustics; Dictionaries; Mathematical model; Speech; Speech coding; Speech processing; Linear speech prediction; epoch detection; sparse signal recovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013 IEEE Workshop on
  • Conference_Location
    New Paltz, NY
  • ISSN
    1931-1168
  • Type

    conf

  • DOI
    10.1109/WASPAA.2013.6701885
  • Filename
    6701885