Sparse DNN-based speaker segmentation using side information

Author

Yong Ma ; Chang-chun Bao

Author_Institution

Speech & Audio Signal Process. Lab., Beijing Univ. of Technol., Beijing, China

Volume

51

Issue

8

fYear

2015

fDate

4 16 2015

Firstpage

651

Lastpage

653

Abstract

Sparse deep neural networks (SDNNs) for speaker segmentation are proposed. First, the SDNNs are trained using the side information that is the class label of the input. Then, speaker-specific features are extracted from the super-vector feature of the speech signal by the SDNNs. Lastly, the label of each speech frame is obtained by K-means clustering, which is used to segment different speakers of a continuous speech stream. The performance evaluation using the multi-speaker speech stream corpus generated from the TIMIT database shows that the proposed speaker segmentation algorithm outperforms the Bayesian information criterion method and the deep auto-encoder networks method.

Keywords

Bayes methods; audio databases; feature extraction; neural nets; pattern clustering; speaker recognition; BIC method; Bayesian information criterion method; SDNN; TIMIT database; continuous speech stream; deep auto-encoder networks method; input class label; k-means clustering; multispeaker speech stream corpus; side information; sparse DNN-based speaker segmentation; sparse deep neural networks; speaker-specific feature extraction; speech frame; speech signal; supervector feature;

fLanguage

English

Journal_Title

Electronics Letters

Publisher

iet

ISSN

0013-5194

Type

jour

DOI

10.1049/el.2015.0298

Filename

7084271