• DocumentCode
    1011521
  • Title

    Blind separation of speech mixtures via time-frequency masking

  • Author

    Yilmaz, Ozgur ; Rickard, Scott

  • Author_Institution
    Dept. of Math., Maryland Univ., College Park, MD, USA
  • Volume
    52
  • Issue
    7
  • fYear
    2004
  • fDate
    7/1/2004 12:00:00 AM
  • Firstpage
    1830
  • Lastpage
    1847
  • Abstract
    Binary time-frequency masks are powerful tools for the separation of sources from a single mixture. Perfect demixing via binary time-frequency masks is possible provided the time-frequency representations of the sources do not overlap: a condition we call W-disjoint orthogonality. We introduce here the concept of approximate W-disjoint orthogonality and present experimental results demonstrating the level of approximate W-disjoint orthogonality of speech in mixtures of various orders. The results demonstrate that there exist ideal binary time-frequency masks that can separate several speech signals from one mixture. While determining these masks blindly from just one mixture is an open problem, we show that we can approximate the ideal masks in the case where two anechoic mixtures are provided. Motivated by the maximum likelihood mixing parameter estimators, we define a power weighted two-dimensional (2-D) histogram constructed from the ratio of the time-frequency representations of the mixtures that is shown to have one peak for each source with peak location corresponding to the relative attenuation and delay mixing parameters. The histogram is used to create time-frequency masks that partition one of the mixtures into the original sources. Experimental results on speech mixtures verify the technique. Example demixing results can be found online at http://alum.mit.edu/www/rickard/bss.html.
  • Keywords
    blind source separation; maximum likelihood estimation; speech processing; time-frequency analysis; W-disjoint orthogonality; blind separation; demixing; maximum likelihood mixing parameter estimators; power weighted two-dimensional histogram; speech mixtures; speech signals; time-frequency masking; time-frequency representations; Attenuation; Delay estimation; Fourier transforms; Histograms; Lattices; Maximum likelihood estimation; Parameter estimation; Speech coding; Time frequency analysis; Two dimensional displays;
  • fLanguage
    English
  • Journal_Title
    Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1053-587X
  • Type

    jour

  • DOI
    10.1109/TSP.2004.828896
  • Filename
    1306640