مرکز منطقه ای اطلاع رساني علوم و فناوري - Power Normalized Cepstral Coefficients based supervectors and i-vectors for small vocabulary speech recognition

DocumentCode :

1797644

Title :

Power Normalized Cepstral Coefficients based supervectors and i-vectors for small vocabulary speech recognition

Author :

Principi, Emanuele ; Squartini, Stefano ; Piazza, Francesco

Author_Institution :

Dept. of Inf. Eng., Univ. Politec. delle Marche, Ancona, Italy

fYear :

2014

fDate :

6-11 July 2014

Firstpage :

3562

Lastpage :

3568

Abstract :

Template-matching and discriminative techniques, like support vector machines (SVMs), have been widely used for automatic speech recognition. Both methods require that varying length sequences are mapped to vectors of fixed lengths: in template-matching, the problem is solved by means of dynamic time warping (DTW), while in SVM with dynamic kernels. The supervector and i-vector paradigms seem to represent a valid solution to such a problem when SVM are employed for classification. In this work, Gaussian mean supervectors (GMS), Gaussian posterior probability supervectors (GPPS) and i-vectors are evaluated as features both for template-matching and for SVM-based speech recognition in a comparative fashion. All these features are based on Power Normalized Cepstral Coefficients (PNCCs) directly extracted from speech utterances. The different methods are assessed in small vocabulary speech recognition tasks using two distinct corpora, and they have been compared to DTW, dynamic time alignment kernel (DTAK), outerproduct of trajectory matrix, and PocketSphinx as further recognition techniques to be evaluated. Experimental results showed the appropriateness of the supervector and i-vector based solutions with respect to the other state-of-the art techniques here addressed.

Keywords :

Gaussian processes; cepstral analysis; pattern matching; speech recognition; support vector machines; DTW; GMS; GPPS; Gaussian mean supervectors; Gaussian posterior probability supervectors; PNCCs; SVM-based speech recognition; SVMs; automatic speech recognition; discriminative techniques; dynamic kernels; dynamic time warping; i-vector paradigm; i-vectors; power normalized cepstral coefficients; speech utterances; supervector paradigm; support vector machines; template-matching; vocabulary speech recognition; Hidden Markov models; Kernel; Microphones; Speech recognition; Support vector machines; Training; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), 2014 International Joint Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4799-6627-1

Type :

conf

DOI :

10.1109/IJCNN.2014.6889552

Filename :

6889552

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1797644