DocumentCode
3214725
Title
A fuzzy based very low bit rate speech coding with high accuracy
Author
Johnny, Milad ; Mirzaee, Javad
Author_Institution
Sch. of Electr. Eng., Iran Univ. of Sci. & Technol. (IUST), Tehran, Iran
fYear
2012
fDate
15-17 May 2012
Firstpage
1302
Lastpage
1306
Abstract
According to the U.S. Federal Standard coder for 2400 bps, a data frame containing 54 bits of encoded signals are transmitted every 22.5 (ms). In each frame, 25 bits encode the spectral features (10 Line Spectrum Frequencies (LSF)). This paper describes a method to reduce the transmission rate while preserving most of the quality and intelligibility. The performance of the proposed coder is at about 780 bits/sec ( = 6 bits/frame × 130 frames/sec). In transmitter, we apply an algorithm to convert speech in to phonetic segments, and then these segments are bifurcated in to the voiced and unvoiced segments. Because of the fact that the spelling time of unvoiced phonetics is short, one cannot distinguish who is pronouncing them, either a male or a female. Literatures in this context show that in most cases, the aforementioned observation is admitted. Therefore, for high accuracy speech transmission, voiced phonetics are more important than unvoiced ones. Hence, a Voiced/Unvoiced decomposition system is proposed. Furthermore, in order to cluster voice segments, fuzzy clustering is applied, in which the proper number of voice segments is determined by a means of statistical method called “Elbow”. Depending on the transmission rate, two different strategies can be utilized. In the first strategy, unvoiced segments of speech can be transmitted by the use of Linear Predictive Coding (LPC) for high quality (MOS=4.5). As a second, unvoiced segments of speech can be recognized and then transmitted for lower quality (MOS=3) and under 100 bits/sec.
Keywords
fuzzy set theory; linear codes; pattern clustering; speech coding; speech intelligibility; speech recognition; statistical analysis; Elbow statistical method; LPC; LSF; bit rate 2400 bit/s; cluster voice segments; fuzzy based very low bit rate speech coding; fuzzy clustering; line spectrum frequencies; linear predictive coding; speech intelligibility; speech transmission; transmitter; voice phonetic segments; voiced-unvoiced decomposition system; Fuzzy segmentation; Speech coding; Voice/unvoiced decomposition; frequency formant;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical Engineering (ICEE), 2012 20th Iranian Conference on
Conference_Location
Tehran
Print_ISBN
978-1-4673-1149-6
Type
conf
DOI
10.1109/IranianCEE.2012.6292557
Filename
6292557
Link To Document