Title :
Power Normalized Cepstral Coefficients based supervectors and i-vectors for small vocabulary speech recognition
Author :
Principi, Emanuele ; Squartini, Stefano ; Piazza, Francesco
Author_Institution :
Dept. of Inf. Eng., Univ. Politec. delle Marche, Ancona, Italy
Abstract :
Template-matching and discriminative techniques, like support vector machines (SVMs), have been widely used for automatic speech recognition. Both methods require that varying length sequences are mapped to vectors of fixed lengths: in template-matching, the problem is solved by means of dynamic time warping (DTW), while in SVM with dynamic kernels. The supervector and i-vector paradigms seem to represent a valid solution to such a problem when SVM are employed for classification. In this work, Gaussian mean supervectors (GMS), Gaussian posterior probability supervectors (GPPS) and i-vectors are evaluated as features both for template-matching and for SVM-based speech recognition in a comparative fashion. All these features are based on Power Normalized Cepstral Coefficients (PNCCs) directly extracted from speech utterances. The different methods are assessed in small vocabulary speech recognition tasks using two distinct corpora, and they have been compared to DTW, dynamic time alignment kernel (DTAK), outerproduct of trajectory matrix, and PocketSphinx as further recognition techniques to be evaluated. Experimental results showed the appropriateness of the supervector and i-vector based solutions with respect to the other state-of-the art techniques here addressed.
Keywords :
Gaussian processes; cepstral analysis; pattern matching; speech recognition; support vector machines; DTW; GMS; GPPS; Gaussian mean supervectors; Gaussian posterior probability supervectors; PNCCs; SVM-based speech recognition; SVMs; automatic speech recognition; discriminative techniques; dynamic kernels; dynamic time warping; i-vector paradigm; i-vectors; power normalized cepstral coefficients; speech utterances; supervector paradigm; support vector machines; template-matching; vocabulary speech recognition; Hidden Markov models; Kernel; Microphones; Speech recognition; Support vector machines; Training; Vectors;
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
DOI :
10.1109/IJCNN.2014.6889552