DocumentCode :
134267
Title :
Exploiting speech source information for vowel landmark detection for low resource language
Author :
Undhad, Ankur G. ; Patil, Hemant A. ; Madhavi, Maulik C.
Author_Institution :
Dhirubhai Ambani Inst. of Inf. & Commun. Technol. (DA-IICT), Gandhinagar, India
fYear :
2014
fDate :
12-14 Sept. 2014
Firstpage :
546
Lastpage :
550
Abstract :
Landmarks are the time instants in a speech signal which marks important events (such as vowels, consonants and glides) in the speech signal. This paper proposes use of novel vowel landmark detection (VLD) algorithm for low resourced language, viz., Gujarati, an Indian language. The proposed VLD method uses speech source information to detect the vowel landmarks which are points of high sonority. The excitation peaks in Hilbert envelope of Teager energy profile of zero frequency filtered (ZFF) speech signal can be interpreted as perceptually significant feature which contribute to the loudness. The performance of proposed VLD method is compared with existing loudness-based method. The results are reported on speech recorded in three different modes, viz., read, spontaneous and lecture followed by manual phonetic transcription by the transcribers (to be used as the ground truth) for Gujarati. In particular, the proposed VLD algorithm performs relatively better than an existing loudness-based method. The proposed VLD algorithm has detection rate of 78.92 %, 76.40 % and 73.89 %, which is 8.79 %, 7.23 % and 7.17 % more as compared to loudness-based method in lecture, spontaneous and read mode, respectively. The proposed algorithm is also shown to be robust against signal degradations such as white noise. In addition, proposed algorithm is fast and requires no training.
Keywords :
filtering theory; speech processing; speech recognition; Gujarati language; Hilbert envelope; Indian language; Teager energy profile; VLD algorithm; ZFF speech signal; consonants; excitation peaks; glides; ground truth; high-sonority points; lecture mode; loudness; low-resource language; manual phonetic transcription; perceptually significant feature; read mode; signal degradation robustness; signal detection rate; speech source information; spontaneous mode; time instants; vowel landmark detection algorithm; white noise; zero-frequency filtered speech signal; Acoustics; Databases; Feature extraction; Signal to noise ratio; Speech; Speech processing; System-on-chip; Landmark; Teager energy operator (TEO); loudness; sonority; vowel-nucleus; zero-frequency resonator (ZFR);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
Type :
conf
DOI :
10.1109/ISCSLP.2014.6936660
Filename :
6936660
Link To Document :
بازگشت