• DocumentCode
    2038718
  • Title

    Speech enhancement method based on spectral filtering utilizing asynchronous signals with embedded timecode over TCP/IP based network

  • Author

    Masmima, Tomohisa ; Chisaki, Yoshifumi ; Usagawa, Tsuyoshi

  • Author_Institution
    Grad. Sch. of Sci. & Technol., Kumamoto Univ., Kumamoto, Japan
  • fYear
    2010
  • fDate
    21-24 Nov. 2010
  • Firstpage
    1341
  • Lastpage
    1346
  • Abstract
    An utterance training system utilizing automatic speech recognition (ASR) has been developed as a computer aided language laboratory system. Because the performance of ASR is seriously degraded due to surround noise, many noise reduction methods has been proposed. In particular, in utterance training system, learners often sit side by side in a classroom so that each learner´s utterances degrade other learners utterances. Spectral subtraction, one of the noise reduction methods, suppresses stationary noise from a signal by subtracting a noise spectrum estimated by observed signal without voice activity. However, it relies on the assumption that the noise is stationary. Even though multi-channel methods such as Delay and Sum, Griffiths-Jim or various Blind Source Separation methods are applicable for non stationary noise, these methods build under synchronization of all input signals. In this paper, a time-frequency masking method utilizing signals observed at distributed computers connected over TCP/IP network is proposed. Because the characteristics of TCP/IP based network, various transmission delays are unavoidable so that signals from computers cannot synchronize perfectly even when a time synching protocol such as the Network Time Protocol is utilized. The proposed method is based on the assumption that the noise spectrum is stable for certain duration. From this assumption, the asynchronous signals observed at distributed computers are utilized for speech enhancement based on time-frequency masking. Simulation results show a possibility to improve the performance of ASR when several interference speakers exist around the target speaker.
  • Keywords
    computer aided instruction; computer networks; filtering theory; signal denoising; speech enhancement; speech recognition; time-frequency analysis; transport protocols; Griffiths-Jim method; TCP-IP based network; asynchronous signal; automatic speech recognition; blind source separation method; computer aided language laboratory system; delay and sum method; embedded timecode; interference speaker; multichannel method; network time protocol; noise reduction method; noise spectrum estimation; spectral filtering; spectral subtraction; speech enhancement method; stationary noise suppression; target speaker; time synching protocol; time-frequency masking method; transmission delay; utterance training system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2010 - 2010 IEEE Region 10 Conference
  • Conference_Location
    Fukuoka
  • ISSN
    pending
  • Print_ISBN
    978-1-4244-6889-8
  • Type

    conf

  • DOI
    10.1109/TENCON.2010.5686064
  • Filename
    5686064