Speech enhancement method based on spectral filtering utilizing asynchronous signals with embedded timecode over TCP/IP based network

Author

Masmima, Tomohisa ; Chisaki, Yoshifumi ; Usagawa, Tsuyoshi

Author_Institution

Grad. Sch. of Sci. & Technol., Kumamoto Univ., Kumamoto, Japan

fYear

2010

fDate

21-24 Nov. 2010

Firstpage

1341

Lastpage

1346

Abstract

An utterance training system utilizing automatic speech recognition (ASR) has been developed as a computer aided language laboratory system. Because the performance of ASR is seriously degraded due to surround noise, many noise reduction methods has been proposed. In particular, in utterance training system, learners often sit side by side in a classroom so that each learner´s utterances degrade other learners utterances. Spectral subtraction, one of the noise reduction methods, suppresses stationary noise from a signal by subtracting a noise spectrum estimated by observed signal without voice activity. However, it relies on the assumption that the noise is stationary. Even though multi-channel methods such as Delay and Sum, Griffiths-Jim or various Blind Source Separation methods are applicable for non stationary noise, these methods build under synchronization of all input signals. In this paper, a time-frequency masking method utilizing signals observed at distributed computers connected over TCP/IP network is proposed. Because the characteristics of TCP/IP based network, various transmission delays are unavoidable so that signals from computers cannot synchronize perfectly even when a time synching protocol such as the Network Time Protocol is utilized. The proposed method is based on the assumption that the noise spectrum is stable for certain duration. From this assumption, the asynchronous signals observed at distributed computers are utilized for speech enhancement based on time-frequency masking. Simulation results show a possibility to improve the performance of ASR when several interference speakers exist around the target speaker.

Keywords

computer aided instruction; computer networks; filtering theory; signal denoising; speech enhancement; speech recognition; time-frequency analysis; transport protocols; Griffiths-Jim method; TCP-IP based network; asynchronous signal; automatic speech recognition; blind source separation method; computer aided language laboratory system; delay and sum method; embedded timecode; interference speaker; multichannel method; network time protocol; noise reduction method; noise spectrum estimation; spectral filtering; spectral subtraction; speech enhancement method; stationary noise suppression; target speaker; time synching protocol; time-frequency masking method; transmission delay; utterance training system;

fLanguage

English

Publisher

ieee

Conference_Titel

TENCON 2010 - 2010 IEEE Region 10 Conference

Conference_Location

Fukuoka

ISSN

pending

Print_ISBN

978-1-4244-6889-8

Type

conf

DOI

10.1109/TENCON.2010.5686064

Filename

5686064