• DocumentCode
    2346375
  • Title

    Single-microphone blind audio source separation via Gaussian short+long term AR models

  • Author

    Schutz, Antony ; Slock, Dirk

  • Author_Institution
    Mobile Commun. Dept., EURECOM, Sophia Antipolis, France
  • fYear
    2010
  • fDate
    3-5 March 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Blind audio source separation (BASS) arises in a number of applications in speech and music processing such as speech enhancement, speaker diarization, automated music transcription etc. Generally, BASS methods consider multichannel signal capture. The single microphone case is the most difficult underdetermined case, but it often arises in practice. In the approach considered here, the main source identifiability comes from exploiting the presumed quasi-periodic nature of sources via long-term autoregressive (AR) modeling. Indeed, musical note signals are quasi-periodic and so is voiced speech, which constitutes the most energetic part of speech signals. We furthermore exploit (e.g. speaker or instrument related) prior information in the spectral envelope of the source signals via short-term AR modeling, to also help unravel spectral portions where source harmonics overlap, and to provide a continuous treatment when sources (e.g. speech) temporarily lose their periodic nature. The novel processing considered here uses windowed signal frames and alternates between frequency and time domain processing for optimized computational complexity and approximation error. We consider Variational Bayesian techniques for joint source extraction and estimation of their AR parameters, the simplified versions of which correspond to EM or SAGE algorithms.
  • Keywords
    Gaussian processes; audio signal processing; autoregressive processes; blind source separation; Gaussian autoregressive modeling; blind audio source separation; computational complexity; multichannel signal capture; music processing; single microphone BASS; source estimation; source extraction; source harmonics; speaker diarization; speech enhancement; speech processing; speech signals; time domain processing; variational Bayesian techniques; Bayesian methods; Independent component analysis; Multiple signal classification; Process control; Signal processing; Source separation; Speech enhancement; Speech processing; Telecommunications; Yttrium; Autoregressive process; Blind Source Separation; Expectation Maximization; Linear Prediction; Speech Processing; Variational Bayes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications, Control and Signal Processing (ISCCSP), 2010 4th International Symposium on
  • Conference_Location
    Limassol
  • Print_ISBN
    978-1-4244-6285-8
  • Type

    conf

  • DOI
    10.1109/ISCCSP.2010.5463308
  • Filename
    5463308