مرکز منطقه ای اطلاع رساني علوم و فناوري - Speech Activity Detection for Multi-Party Conversation Analyses Based on Likelihood Ratio Test on Spatial Magnitude

DocumentCode :

1493816

Title :

Speech Activity Detection for Multi-Party Conversation Analyses Based on Likelihood Ratio Test on Spatial Magnitude

Author :

Ishizuka, Kentaro ; Araki, Shoko ; Kawahara, Tatsuya

Author_Institution :

NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan

Volume :

Issue :

fYear :

2010

Firstpage :

1354

Lastpage :

1365

Abstract :

This paper proposes a microphone array-based speech activity detection (SAD) method for analyzing multi-party conversations recorded in the presence of noise. In particular, the proposed method considers conversations where the number of speakers and speaker locations cannot be restricted, such as when standing and talking, and at poster sessions. When we observe such conversations, there are directional noise sources and diffuse noise that affect the direction of arrival estimations of the target speech signals. To detect speech activity without a priori knowledge about the speakers and noise environments, a likelihood ratio test (LRT)-based SAD method is applied to spatial magnitude, which are estimated by using the time-frequency masking of the observed spectra. The proposed method can exploit the enhanced signals obtained from time-frequency masking, and works even in the presence of environmental noise. Experiments with recorded simulated poster sessions confirmed that the proposed method could outperform conventional methods based on the LRT for a single channel, magnitude coherence, or crosspower spectrum phase.

Keywords :

direction-of-arrival estimation; microphone arrays; signal detection; speaker recognition; time-frequency analysis; crosspower spectrum phase; diffuse noise; direction of arrival estimations; directional noise sources; likelihood ratio test based SAD method; magnitude coherence; microphone array-based speech activity detection method; multiparty conversation analysis; observed spectra; spatial magnitude; speech activity detection; target speech signals; time-frequency masking estimation; Direction of arrival estimation; microphone arrays; multi-party conversations; spatial information; speech activity detection (SAD);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2033955

Filename :

5280323

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1493816