• DocumentCode
    43257
  • Title

    An Unsupervised Approach to Cochannel Speech Separation

  • Author

    Hu, Ke ; Wang, DeLiang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • Volume
    21
  • Issue
    1
  • fYear
    2013
  • fDate
    Jan. 2013
  • Firstpage
    122
  • Lastpage
    131
  • Abstract
    Cochannel (two-talker) speech separation is predominantly addressed using pretrained speaker dependent models. In this paper, we propose an unsupervised approach to separating cochannel speech. Our approach follows the two main stages of computational auditory scene analysis: segmentation and grouping. For voiced speech segregation, the proposed system utilizes a tandem algorithm for simultaneous grouping and then unsupervised clustering for sequential grouping. The clustering is performed by a search to maximize the ratio of between- and within-group speaker distances while penalizing within-group concurrent pitches. To segregate unvoiced speech, we first produce unvoiced speech segments based on onset/offset analysis. The segments are grouped using the complementary binary masks of segregated voiced speech. Despite its simplicity, our approach produces significant SNR improvements across a range of input SNR. The proposed system yields competitive performance in comparison to other speaker-independent and model-based methods.
  • Keywords
    pattern clustering; source separation; speech processing; between-group speaker distance; cochannel speech separation; complementary binary mask; computational auditory scene analysis; grouping stage; onset-offset analysis; pretrained speaker dependent model; segmentation stage; sequential grouping; unsupervised clustering; unvoiced speech segment; unvoiced speech segregation; within-group concurrent pitch; within-group speaker distance; Algorithm design and analysis; Clustering algorithms; Computational modeling; Hidden Markov models; Signal to noise ratio; Speech; Time frequency analysis; Computational auditory scene analysis (CASA); cochannel speech separation; sequential grouping; unsupervised clustering; unvoiced speech segregation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2215591
  • Filename
    6303834