DocumentCode
134252
Title
Effectiveness of fractal dimension for ASR in low resource language
Author
Zaki, Mohammadi ; Shah, N.J. ; Patil, Hemant A.
Author_Institution
Dhirubhai Ambani Inst. of Inf. & Commun. Technol. (DA-IICT), Gandhinagar, India
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
464
Lastpage
468
Abstract
We propose to use multiscale fractal dimension (MFD) as components of feature vectors for automatic speech recognition (ASR) especially in low resource languages. Speech, which is known to be a nonlinear process, can be efficiently represented by extracting some nonlinear properties, such as fractal dimension, from the speech segment. During speech production, vortices (generated due to presence of separated airflow) may travel along the vocal tract and excite vocal tract resonators at the epiglottis, velum, palate, teeth, lips, etc. By Kolmogorov´s law, the gradient in energy levels between these vortices produces turbulence. This ruggedness, and in effect, the embedded features of different phoneme classes, can be captured by invariant property of FD. Furthermore, speech is a multifractal, which justifies the use of multiscale fractal dimension as feature components for speech. In this paper, we describe the multifractal nature of speech signal and use this property for automatic phonetic segmentation task. The results show a significant decrease in % EER (≈ 4.2 % from traditional MFCC base features and ≈ 2.5 % from MFCC appended with 1-D fractal dimension). The DET curves clearly show improvement in the performance with the new multiscale fractal dimension-based features for low resource language under consideration.
Keywords
speech recognition; vectors; ASR; DET curve; Kolmogorov law; MFCC base features; automatic phonetic segmentation; automatic speech recognition; energy level; feature vector; low resource language; multiscale fractal dimension; nonlinear property; phoneme class; speech production; vocal tract resonator; vortices; Feature extraction; Fractals; Mel frequency cepstral coefficient; Production; Speech; Speech processing; Vectors; Automatic phonetic segmentation; multifractal; multiscale fractal dimension; nonlinearities;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936645
Filename
6936645
Link To Document