DocumentCode
1687934
Title
A sparse representation approach for perceptual quality improvement of separated speech
Author
Williamson, Donald S. ; Yuxuan Wang ; DeLiang Wang
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear
2013
Firstpage
7015
Lastpage
7019
Abstract
Speech separation based on time-frequency masking has been shown to improve intelligibility of speech signals corrupted by noise. A perceived weakness of binary masking is the quality of separated speech. In this paper, an approach for improving the perceptual quality of separated speech from binary masking is proposed. Our approach consists of two stages, where a binary mask is generated in the first stage that effectively performs speech separation. In the second stage, a sparse-representation approach is used to represent the separated signal by a linear combination of Short-time Fourier Transform (STFT) magnitudes that are generated from a clean speech dictionary. Overlap-and-add synthesis is then used to generate an estimate of the speech signal. The performance of the proposed approach is evaluated with the Perceptual Evaluation of Speech Quality (PESQ), which is a standard objective speech quality measure. The proposed algorithm offers considerable improvements in speech quality over binary-masked noisy speech and other reconstruction approaches.
Keywords
Fourier transforms; signal representation; speech intelligibility; speech synthesis; PESQ; STFT magnitudes; binary masking; binary-masked noisy speech; clean speech dictionary; linear combination; noise corruption; overlap-and-add synthesis; perceptual evaluation of speech quality; perceptual quality improvement; separated signal representation; separated speech quality; short-time Fourier transform; sparse representation approach; sparse-representation approach; speech quality measure; speech separation; speech signal estimation; speech signals intelligibility; time-frequency masking; Dictionaries; Noise; Noise measurement; Spectrogram; Speech; Time-frequency analysis; Vectors; Binary Masking; Ideal Binary Mask (IBM); Sparse Representations; Speech Quality; Speech Separation;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6639022
Filename
6639022
Link To Document