مرکز منطقه ای اطلاع رساني علوم و فناوري - A sparse representation approach for perceptual quality improvement of separated speech

DocumentCode :

1687934

Title :

A sparse representation approach for perceptual quality improvement of separated speech

Author :

Williamson, Donald S. ; Yuxuan Wang ; DeLiang Wang

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2013

Firstpage :

7015

Lastpage :

7019

Abstract :

Speech separation based on time-frequency masking has been shown to improve intelligibility of speech signals corrupted by noise. A perceived weakness of binary masking is the quality of separated speech. In this paper, an approach for improving the perceptual quality of separated speech from binary masking is proposed. Our approach consists of two stages, where a binary mask is generated in the first stage that effectively performs speech separation. In the second stage, a sparse-representation approach is used to represent the separated signal by a linear combination of Short-time Fourier Transform (STFT) magnitudes that are generated from a clean speech dictionary. Overlap-and-add synthesis is then used to generate an estimate of the speech signal. The performance of the proposed approach is evaluated with the Perceptual Evaluation of Speech Quality (PESQ), which is a standard objective speech quality measure. The proposed algorithm offers considerable improvements in speech quality over binary-masked noisy speech and other reconstruction approaches.

Keywords :

Fourier transforms; signal representation; speech intelligibility; speech synthesis; PESQ; STFT magnitudes; binary masking; binary-masked noisy speech; clean speech dictionary; linear combination; noise corruption; overlap-and-add synthesis; perceptual evaluation of speech quality; perceptual quality improvement; separated signal representation; separated speech quality; short-time Fourier transform; sparse representation approach; sparse-representation approach; speech quality measure; speech separation; speech signal estimation; speech signals intelligibility; time-frequency masking; Dictionaries; Noise; Noise measurement; Spectrogram; Speech; Time-frequency analysis; Vectors; Binary Masking; Ideal Binary Mask (IBM); Sparse Representations; Speech Quality; Speech Separation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639022

Filename :

6639022

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1687934