DocumentCode :
610130
Title :
Evaluation of Efficient Compression Properties of the Complete Oscillator Method, Part 2: Speech Coding
Author :
Yen, A. ; Gorodnitsky, I.
Author_Institution :
SPAWAR Syst. Center, San Diego, CA, USA
fYear :
2013
fDate :
20-22 March 2013
Firstpage :
531
Lastpage :
531
Abstract :
Summary form only given. This paper examines the performance of the recently proposed Complete Oscillator Method (COM) in the context of coding speech. The COM is shown to provide several advantages over traditional predictive coding techniques. Unlike the cascaded method employed by codecs such as Adaptive Multi-Rate (AMR), the COM encodes short and long-term data features jointly using a single, flexible representation. Joint approaches have previously been shown to yield efficiency gains [1]. Furthermore, the COM does not always require an explicit encoding of the residual error to reconstruct the signal. As AMR can allocate as much as 85% of its coding budget towards encoding the residual, there is substantial motivation for finding alternatives to source-filter coding methods. The first part of the paper compares the synthesis of speech frames using the COM versus a combination of linear predictor and adaptive codebook (LPAC) in order to assess the deterministic modeling capabilities of the COM relative to linear predictive codes. With both approaches optimized by minimizing the perceptually-weighted error (PWE) between the original and reconstructed speech, the COM is shown to achieve lower PWE on average than LPAC as implemented in the AMR standard for several types of speech. The COM improved PWE in 78.20% of voiced frames yielding a 2.02 dB PWE gain on average. For voiced to unvoiced transitions, the COM improved PWE in 76.75% of the frames with a 1.26 dB average gain. For unvoiced speech, the COM consistently improved PWE but the average gain was not significant. Only for unvoiced to voiced transitions did the COM not produce gains in average PWE. The second part of the paper compares the synthesis of speech frames using the COM at several bit rates to standard AMR and Speex codecs to show that the COM can produce comparable quality speech in a significant percentage of frames. Using weighted spectral slope distance (WSS) as a metric, a 5.5 kbps COM was s- en to outperform 12.2 kbps AMR in 24.12% of speech frames. These results are not intended to demonstrate the workings of a COM-only speech coder, but rather to suggest how existing codecs can achieve lower bit rates by using the COM to encode some subset of frames. For example, by using the COM in the lowest bit rate mode sufficient to achieve a similar WSS as 12.2 kbps AMR, the average bit rate can potentially be reduced to 9.16 kbps.
Keywords :
oscillators; signal reconstruction; signal representation; speech codecs; speech coding; speech synthesis; AMR standard; COM-only speech coder; LPAC; PWE; WSS distance; adaptive multirate standard; bit rate 12.2 kbit/s; bit rate 9.16 kbit/s; cascaded method; codecs; complete oscillator method; efficient compression property evaluation; flexible representation; gain 1.26 dB; linear predictive codes; linear predictor and adaptive codebook; perceptually-weighted error; residual error explicit encoding; signal reconstruction; single representation; source-filter coding methods; speech coding context; speech frame synthesis; voiced frames; voiced transitions; weighted spectral slope distance; Bit rate; Codecs; Gain; Oscillators; Speech; Speech coding; Speech analysis; speech codecs; speech coding; speech processing; speech synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2013
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-4673-6037-1
Type :
conf
DOI :
10.1109/DCC.2013.110
Filename :
6543141
Link To Document :
بازگشت