DocumentCode
22810
Title
Sparse DNN-based speaker segmentation using side information
Author
Yong Ma ; Chang-chun Bao
Author_Institution
Speech & Audio Signal Process. Lab., Beijing Univ. of Technol., Beijing, China
Volume
51
Issue
8
fYear
2015
fDate
4 16 2015
Firstpage
651
Lastpage
653
Abstract
Sparse deep neural networks (SDNNs) for speaker segmentation are proposed. First, the SDNNs are trained using the side information that is the class label of the input. Then, speaker-specific features are extracted from the super-vector feature of the speech signal by the SDNNs. Lastly, the label of each speech frame is obtained by K-means clustering, which is used to segment different speakers of a continuous speech stream. The performance evaluation using the multi-speaker speech stream corpus generated from the TIMIT database shows that the proposed speaker segmentation algorithm outperforms the Bayesian information criterion method and the deep auto-encoder networks method.
Keywords
Bayes methods; audio databases; feature extraction; neural nets; pattern clustering; speaker recognition; BIC method; Bayesian information criterion method; SDNN; TIMIT database; continuous speech stream; deep auto-encoder networks method; input class label; k-means clustering; multispeaker speech stream corpus; side information; sparse DNN-based speaker segmentation; sparse deep neural networks; speaker-specific feature extraction; speech frame; speech signal; supervector feature;
fLanguage
English
Journal_Title
Electronics Letters
Publisher
iet
ISSN
0013-5194
Type
jour
DOI
10.1049/el.2015.0298
Filename
7084271
Link To Document