DocumentCode :
8045
Title :
Incremental Syllable-Context Phonetic Vocoding
Author :
Cernak, Milos ; Garner, Philip N. ; Lazaridis, Alexandros ; Motlicek, Petr ; Xingyu Na
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
Volume :
23
Issue :
6
fYear :
2015
fDate :
Jun-15
Firstpage :
1019
Lastpage :
1030
Abstract :
Current very low bit rate speech coders are, due to complexity limitations, designed to work off-line. This paper investigates incremental speech coding that operates real-time and incrementally (i.e., encoded speech depends only on already-uttered speech without the need of future speech information). Since human speech communication is asynchronous (i.e., different information flows being simultaneously processed), we hypothesized that such an incremental speech coder should also operate asynchronously. To accomplish this task, we describe speech coding that reflects the human cortical temporal sampling that packages information into units of different temporal granularity, such as phonemes and syllables, in parallel. More specifically, a phonetic vocoder-cascaded speech recognition and synthesis systems-extended with syllable-based information transmission mechanisms is investigated. There are two main aspects evaluated in this work, the synchronous and asynchronous coding. Synchronous coding refers to the case when the phonetic vocoder and speech generation process depend on the syllable boundaries during encoding and decoding respectively. On the other hand, asynchronous coding refers to the case when the phonetic encoding and speech generation processes are done independently of the syllable boundaries. Our experiments confirmed that the asynchronous incremental speech coding performs better, in terms of intelligibility and overall speech quality, mainly due to better alignment of the segmental and prosodic information. The proposed vocoding operates at an uncompressed bit rate of 213 bits/sec and achieves an average communication delay of 243 ms.
Keywords :
speech coding; speech recognition; speech synthesis; vocoders; already-uttered speech; asynchronous coding; asynchronous human speech communication; human cortical temporal sampling; incremental speech coder; incremental speech coding; incremental syllable-context; phonetic encoding; phonetic vocoding; speech generation; speech recognition; speech synthesis; syllable boundary; syllable-based information transmission; uncompressed bit rate; Decoding; Real-time systems; Speech; Speech coding; Speech processing; Vocoders; Parametric speech synthesis; very low bit rate speech coding;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2015.2418577
Filename :
7073585
Link To Document :
بازگشت