مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploring speaker-specific characteristics with deep learning

DocumentCode :

3492141

Title :

Exploring speaker-specific characteristics with deep learning

Author :

Salman, Ahmad ; Chen, Ke

Author_Institution :

Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK

fYear :

2011

fDate :

July 31 2011-Aug. 5 2011

Firstpage :

103

Lastpage :

110

Abstract :

Speech signals convey different types of information which vary from linguistic to speaker-specific and should be used in different tasks. However, it is hard to extract a special type of information such that nearly all acoustic representations of speech present all kinds of information as a whole. The use of the same representation in different tasks creates a difficulty in achieving good performance in either speech or speaker recognition. In this paper, we present a deep neural architecture to explore speaker-specific characteristics from popular Mel-frequency cepstral coefficients. For learning, we propose an objective function consisting of contrastive cost in terms of speaker similarity and dissimilarity as well as data reconstruction cost used as regularization to normalize non-speaker related information. Learning deep architecture is done by a greedy layerwise local unsupervised training for initialization and a global supervised discriminative training for extracting a speaker-specific representation. By means of two narrow-band benchmark corpora, we demonstrate that our deep architecture generates a robust overcomplete speech representation in characterizing various speakers and the use of this new representation yields a favorite performance in speaker verification.

Keywords :

acoustic signal processing; greedy algorithms; speaker recognition; speech processing; unsupervised learning; Mel frequency cepstral coefficient; acoustic speech representation; data reconstruction cost; deep learning; deep neural architecture; global supervised discriminative training; greedy layerwise local unsupervised training; information extraction; linguistic; speaker dissimilarity; speaker recognition; speaker similarity; speaker verification; speaker-specific characteristics; speech recognition; Cost function; DNA; Neurons; Speaker recognition; Speech; Speech recognition; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), The 2011 International Joint Conference on

Conference_Location :

San Jose, CA

ISSN :

2161-4393

Print_ISBN :

978-1-4244-9635-8

Type :

conf

DOI :

10.1109/IJCNN.2011.6033207

Filename :

6033207

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3492141