Title :
A sparse representation approach for perceptual quality improvement of separated speech
Author :
Williamson, Donald S. ; Yuxuan Wang ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Speech separation based on time-frequency masking has been shown to improve intelligibility of speech signals corrupted by noise. A perceived weakness of binary masking is the quality of separated speech. In this paper, an approach for improving the perceptual quality of separated speech from binary masking is proposed. Our approach consists of two stages, where a binary mask is generated in the first stage that effectively performs speech separation. In the second stage, a sparse-representation approach is used to represent the separated signal by a linear combination of Short-time Fourier Transform (STFT) magnitudes that are generated from a clean speech dictionary. Overlap-and-add synthesis is then used to generate an estimate of the speech signal. The performance of the proposed approach is evaluated with the Perceptual Evaluation of Speech Quality (PESQ), which is a standard objective speech quality measure. The proposed algorithm offers considerable improvements in speech quality over binary-masked noisy speech and other reconstruction approaches.
Keywords :
Fourier transforms; signal representation; speech intelligibility; speech synthesis; PESQ; STFT magnitudes; binary masking; binary-masked noisy speech; clean speech dictionary; linear combination; noise corruption; overlap-and-add synthesis; perceptual evaluation of speech quality; perceptual quality improvement; separated signal representation; separated speech quality; short-time Fourier transform; sparse representation approach; sparse-representation approach; speech quality measure; speech separation; speech signal estimation; speech signals intelligibility; time-frequency masking; Dictionaries; Noise; Noise measurement; Spectrogram; Speech; Time-frequency analysis; Vectors; Binary Masking; Ideal Binary Mask (IBM); Sparse Representations; Speech Quality; Speech Separation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639022