DocumentCode :
22810
Title :
Sparse DNN-based speaker segmentation using side information
Author :
Yong Ma ; Chang-chun Bao
Author_Institution :
Speech & Audio Signal Process. Lab., Beijing Univ. of Technol., Beijing, China
Volume :
51
Issue :
8
fYear :
2015
fDate :
4 16 2015
Firstpage :
651
Lastpage :
653
Abstract :
Sparse deep neural networks (SDNNs) for speaker segmentation are proposed. First, the SDNNs are trained using the side information that is the class label of the input. Then, speaker-specific features are extracted from the super-vector feature of the speech signal by the SDNNs. Lastly, the label of each speech frame is obtained by K-means clustering, which is used to segment different speakers of a continuous speech stream. The performance evaluation using the multi-speaker speech stream corpus generated from the TIMIT database shows that the proposed speaker segmentation algorithm outperforms the Bayesian information criterion method and the deep auto-encoder networks method.
Keywords :
Bayes methods; audio databases; feature extraction; neural nets; pattern clustering; speaker recognition; BIC method; Bayesian information criterion method; SDNN; TIMIT database; continuous speech stream; deep auto-encoder networks method; input class label; k-means clustering; multispeaker speech stream corpus; side information; sparse DNN-based speaker segmentation; sparse deep neural networks; speaker-specific feature extraction; speech frame; speech signal; supervector feature;
fLanguage :
English
Journal_Title :
Electronics Letters
Publisher :
iet
ISSN :
0013-5194
Type :
jour
DOI :
10.1049/el.2015.0298
Filename :
7084271
Link To Document :
بازگشت