DocumentCode
2038718
Title
Speech enhancement method based on spectral filtering utilizing asynchronous signals with embedded timecode over TCP/IP based network
Author
Masmima, Tomohisa ; Chisaki, Yoshifumi ; Usagawa, Tsuyoshi
Author_Institution
Grad. Sch. of Sci. & Technol., Kumamoto Univ., Kumamoto, Japan
fYear
2010
fDate
21-24 Nov. 2010
Firstpage
1341
Lastpage
1346
Abstract
An utterance training system utilizing automatic speech recognition (ASR) has been developed as a computer aided language laboratory system. Because the performance of ASR is seriously degraded due to surround noise, many noise reduction methods has been proposed. In particular, in utterance training system, learners often sit side by side in a classroom so that each learner´s utterances degrade other learners utterances. Spectral subtraction, one of the noise reduction methods, suppresses stationary noise from a signal by subtracting a noise spectrum estimated by observed signal without voice activity. However, it relies on the assumption that the noise is stationary. Even though multi-channel methods such as Delay and Sum, Griffiths-Jim or various Blind Source Separation methods are applicable for non stationary noise, these methods build under synchronization of all input signals. In this paper, a time-frequency masking method utilizing signals observed at distributed computers connected over TCP/IP network is proposed. Because the characteristics of TCP/IP based network, various transmission delays are unavoidable so that signals from computers cannot synchronize perfectly even when a time synching protocol such as the Network Time Protocol is utilized. The proposed method is based on the assumption that the noise spectrum is stable for certain duration. From this assumption, the asynchronous signals observed at distributed computers are utilized for speech enhancement based on time-frequency masking. Simulation results show a possibility to improve the performance of ASR when several interference speakers exist around the target speaker.
Keywords
computer aided instruction; computer networks; filtering theory; signal denoising; speech enhancement; speech recognition; time-frequency analysis; transport protocols; Griffiths-Jim method; TCP-IP based network; asynchronous signal; automatic speech recognition; blind source separation method; computer aided language laboratory system; delay and sum method; embedded timecode; interference speaker; multichannel method; network time protocol; noise reduction method; noise spectrum estimation; spectral filtering; spectral subtraction; speech enhancement method; stationary noise suppression; target speaker; time synching protocol; time-frequency masking method; transmission delay; utterance training system;
fLanguage
English
Publisher
ieee
Conference_Titel
TENCON 2010 - 2010 IEEE Region 10 Conference
Conference_Location
Fukuoka
ISSN
pending
Print_ISBN
978-1-4244-6889-8
Type
conf
DOI
10.1109/TENCON.2010.5686064
Filename
5686064
Link To Document