DocumentCode
32580
Title
Inversion of Auditory Spectrograms, Traditional Spectrograms, and Other Envelope Representations
Author
Decorsiere, Remi ; Søndergaard, Peter L. ; MacDonald, Ewen N. ; Dau, Torsten
Author_Institution
Oticon Centre of Excellence for Hearing & Speech Sci., Tech. Univ. of Denmark, Lyngby, Denmark
Volume
23
Issue
1
fYear
2015
fDate
Jan. 2015
Firstpage
46
Lastpage
56
Abstract
Envelope representations such as the auditory or traditional spectrogram can be defined by the set of envelopes from the outputs of a filterbank. Common envelope extraction methods discard information regarding the fast fluctuations, or phase, of the signal. Thus, it is difficult to invert, or reconstruct a time-domain signal from, an arbitrary envelope representation. To address this problem, a general optimization approach in the time domain is proposed here, which iteratively minimizes the distance between a target envelope representation and that of a reconstructed time-domain signal. Two implementations of this framework are presented for auditory spectrograms, where the filterbank is based on the behavior of the basilar membrane and envelope extraction is modeled on the response of inner hair cells. One implementation is direct while the other is a two-stage approach that is computationally simpler. While both can accurately invert an auditory spectrogram, the two-stage approach performs better on time-domain metrics. The same framework is applied to traditional spectrograms based on the magnitude of the short-time Fourier transform. Inspired by human perception of loudness, a modification to the framework is proposed, which leads to a more accurate inversion of traditional spectrograms.
Keywords
Fourier transforms; audio signal processing; channel bank filters; feature extraction; optimisation; signal reconstruction; signal representation; arbitrary envelope representation; auditory spectrogram inversion; envelope extraction methods; filter bank; general optimization approach; human perception; inner hair cells; short-time Fourier transform; time-domain signal reconstruction; two-stage approach; Fourier transforms; Optimization; Psychoacoustic models; Spectrogram; Speech; Speech processing; Time-domain analysis; Spectrogram inversion; auditory spectrogram; gradient methods; short-time Fourier transform;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2014.2367821
Filename
6949659
Link To Document