Title :
Nonnegative factorization of sequences of speech and music spectra
Author :
Arjona Ramirez, Miguel
Author_Institution :
Univ. of Sao Paulo, Sao Paulo, Brazil
Abstract :
Speech source separation from mixtures with music signals is performed by means of nonnegative matrix factorization (NMF). The magnitude of the short-term Fourier transform (STFT) is used with the Kullback-Leibler divergence (KLD) while its power spectral density is applied with the Itakura-Saito divergence (ISD). Source separation is performed in the synthesis phase by factorization with compound bases using unsupervised NMF while the bases may be exemplar spectra randomly selected in an unsupervised manner or iteratively approached by a supervised factorization. Speech is represented four times more densely than music in the supervised case and almost six times more in the unsupervised case. The performance of the bases obtained in supervised NMF is proven to be far superior to the exemplar bases, selected in an unsupervised procedure. Also, the quality of the separated speech signal obtained by NMF with KLD clearly exceeds that of the speech signal delivered by NMF with ISD.
Keywords :
Fourier transforms; iterative methods; matrix decomposition; music; source separation; speech processing; speech synthesis; ISD; Itakura-Saito divergence; KLD; Kullback-Leibler divergence; NMF; STFT magnitude; music signals; music spectra sequences; nonnegative factorization; power spectral density; short-term Fourier transform; speech source separation; speech spectra sequences; supervised factorization; synthesis phase; Databases; Multiple signal classification; Source separation; Speech; Speech recognition; Training; Vectors; Itakura-Saito divergence; Kullback-Leibler divergence; Nonnegative matrix factorization; blind audio source separation; speech enhancement;
Conference_Titel :
Telecommunications Symposium (ITS), 2014 International
Conference_Location :
Sao Paulo
DOI :
10.1109/ITS.2014.6948021