DocumentCode :
2038718
Title :
Speech enhancement method based on spectral filtering utilizing asynchronous signals with embedded timecode over TCP/IP based network
Author :
Masmima, Tomohisa ; Chisaki, Yoshifumi ; Usagawa, Tsuyoshi
Author_Institution :
Grad. Sch. of Sci. & Technol., Kumamoto Univ., Kumamoto, Japan
fYear :
2010
fDate :
21-24 Nov. 2010
Firstpage :
1341
Lastpage :
1346
Abstract :
An utterance training system utilizing automatic speech recognition (ASR) has been developed as a computer aided language laboratory system. Because the performance of ASR is seriously degraded due to surround noise, many noise reduction methods has been proposed. In particular, in utterance training system, learners often sit side by side in a classroom so that each learner´s utterances degrade other learners utterances. Spectral subtraction, one of the noise reduction methods, suppresses stationary noise from a signal by subtracting a noise spectrum estimated by observed signal without voice activity. However, it relies on the assumption that the noise is stationary. Even though multi-channel methods such as Delay and Sum, Griffiths-Jim or various Blind Source Separation methods are applicable for non stationary noise, these methods build under synchronization of all input signals. In this paper, a time-frequency masking method utilizing signals observed at distributed computers connected over TCP/IP network is proposed. Because the characteristics of TCP/IP based network, various transmission delays are unavoidable so that signals from computers cannot synchronize perfectly even when a time synching protocol such as the Network Time Protocol is utilized. The proposed method is based on the assumption that the noise spectrum is stable for certain duration. From this assumption, the asynchronous signals observed at distributed computers are utilized for speech enhancement based on time-frequency masking. Simulation results show a possibility to improve the performance of ASR when several interference speakers exist around the target speaker.
Keywords :
computer aided instruction; computer networks; filtering theory; signal denoising; speech enhancement; speech recognition; time-frequency analysis; transport protocols; Griffiths-Jim method; TCP-IP based network; asynchronous signal; automatic speech recognition; blind source separation method; computer aided language laboratory system; delay and sum method; embedded timecode; interference speaker; multichannel method; network time protocol; noise reduction method; noise spectrum estimation; spectral filtering; spectral subtraction; speech enhancement method; stationary noise suppression; target speaker; time synching protocol; time-frequency masking method; transmission delay; utterance training system;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2010 - 2010 IEEE Region 10 Conference
Conference_Location :
Fukuoka
ISSN :
pending
Print_ISBN :
978-1-4244-6889-8
Type :
conf
DOI :
10.1109/TENCON.2010.5686064
Filename :
5686064
Link To Document :
بازگشت