• DocumentCode
    1761902
  • Title

    Joint Mixing Vector and Binaural Model Based Stereo Source Separation

  • Author

    Alinaghi, Atiyeh ; Jackson, Philip Jb ; Qingju Liu ; Wenwu Wang

  • Author_Institution
    Dept. of Electron. Eng., Univ. of Surrey, Guildford, UK
  • Volume
    22
  • Issue
    9
  • fYear
    2014
  • fDate
    Sept. 2014
  • Firstpage
    1434
  • Lastpage
    1448
  • Abstract
    In this paper the mixing vector (MV) in the statistical mixing model is compared to the binaural cues represented by interaural level and phase differences (ILD and IPD). It is shown that the MV distributions are quite distinct while binaural models overlap when the sources are close to each other. On the other hand, the binaural cues are more robust to high reverberation than MV models. According to this complementary behavior we introduce a new robust algorithm for stereo speech separation which considers both additive and convolutive noise signals to model the MV and binaural cues in parallel and estimate probabilistic time-frequency masks. The contribution of each cue to the final decision is also adjusted by weighting the log-likelihoods of the cues empirically. Furthermore, the permutation problem of the frequency domain blind source separation (BSS) is addressed by initializing the MVs based on binaural cues. Experiments are performed systematically on determined and underdetermined speech mixtures in five rooms with various acoustic properties including anechoic, highly reverberant, and spatially-diffuse noise conditions. The results in terms of signal-to-distortion-ratio (SDR) confirm the benefits of integrating the MV and binaural cues, as compared with two state-of-the-art baseline algorithms which only use MV or the binaural cues.
  • Keywords
    blind source separation; reverberation; speech recognition; statistical distributions; time-frequency analysis; BSS; ILD; IPD; MV distribution; SDR; acoustic properties; additive noise signals; anechoic properties; binaural cues; binaural model based stereo source separation; convolutive noise signals; frequency domain blind source separation; interaural level and phase differences; joint mixing vector; probabilistic time-frequency masks; reverberant properties; signal-to-distortion-ratio; spatially-diffuse noise condition; speech mixtures; statistical mixing model; stereo speech separation; Additives; Noise; Reverberation; Source separation; Speech; Time-frequency analysis; Vectors; Blind source separation; computational auditory scene analysis; reverberation; time-frequency masking;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2320637
  • Filename
    6807786