• DocumentCode
    22810
  • Title

    Sparse DNN-based speaker segmentation using side information

  • Author

    Yong Ma ; Chang-chun Bao

  • Author_Institution
    Speech & Audio Signal Process. Lab., Beijing Univ. of Technol., Beijing, China
  • Volume
    51
  • Issue
    8
  • fYear
    2015
  • fDate
    4 16 2015
  • Firstpage
    651
  • Lastpage
    653
  • Abstract
    Sparse deep neural networks (SDNNs) for speaker segmentation are proposed. First, the SDNNs are trained using the side information that is the class label of the input. Then, speaker-specific features are extracted from the super-vector feature of the speech signal by the SDNNs. Lastly, the label of each speech frame is obtained by K-means clustering, which is used to segment different speakers of a continuous speech stream. The performance evaluation using the multi-speaker speech stream corpus generated from the TIMIT database shows that the proposed speaker segmentation algorithm outperforms the Bayesian information criterion method and the deep auto-encoder networks method.
  • Keywords
    Bayes methods; audio databases; feature extraction; neural nets; pattern clustering; speaker recognition; BIC method; Bayesian information criterion method; SDNN; TIMIT database; continuous speech stream; deep auto-encoder networks method; input class label; k-means clustering; multispeaker speech stream corpus; side information; sparse DNN-based speaker segmentation; sparse deep neural networks; speaker-specific feature extraction; speech frame; speech signal; supervector feature;
  • fLanguage
    English
  • Journal_Title
    Electronics Letters
  • Publisher
    iet
  • ISSN
    0013-5194
  • Type

    jour

  • DOI
    10.1049/el.2015.0298
  • Filename
    7084271