مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploiting speech source information for vowel landmark detection for low resource language

DocumentCode :

134267

Title :

Exploiting speech source information for vowel landmark detection for low resource language

Author :

Undhad, Ankur G. ; Patil, Hemant A. ; Madhavi, Maulik C.

Author_Institution :

Dhirubhai Ambani Inst. of Inf. & Commun. Technol. (DA-IICT), Gandhinagar, India

fYear :

2014

fDate :

12-14 Sept. 2014

Firstpage :

546

Lastpage :

550

Abstract :

Landmarks are the time instants in a speech signal which marks important events (such as vowels, consonants and glides) in the speech signal. This paper proposes use of novel vowel landmark detection (VLD) algorithm for low resourced language, viz., Gujarati, an Indian language. The proposed VLD method uses speech source information to detect the vowel landmarks which are points of high sonority. The excitation peaks in Hilbert envelope of Teager energy profile of zero frequency filtered (ZFF) speech signal can be interpreted as perceptually significant feature which contribute to the loudness. The performance of proposed VLD method is compared with existing loudness-based method. The results are reported on speech recorded in three different modes, viz., read, spontaneous and lecture followed by manual phonetic transcription by the transcribers (to be used as the ground truth) for Gujarati. In particular, the proposed VLD algorithm performs relatively better than an existing loudness-based method. The proposed VLD algorithm has detection rate of 78.92 %, 76.40 % and 73.89 %, which is 8.79 %, 7.23 % and 7.17 % more as compared to loudness-based method in lecture, spontaneous and read mode, respectively. The proposed algorithm is also shown to be robust against signal degradations such as white noise. In addition, proposed algorithm is fast and requires no training.

Keywords :

filtering theory; speech processing; speech recognition; Gujarati language; Hilbert envelope; Indian language; Teager energy profile; VLD algorithm; ZFF speech signal; consonants; excitation peaks; glides; ground truth; high-sonority points; lecture mode; loudness; low-resource language; manual phonetic transcription; perceptually significant feature; read mode; signal degradation robustness; signal detection rate; speech source information; spontaneous mode; time instants; vowel landmark detection algorithm; white noise; zero-frequency filtered speech signal; Acoustics; Databases; Feature extraction; Signal to noise ratio; Speech; Speech processing; System-on-chip; Landmark; Teager energy operator (TEO); loudness; sonority; vowel-nucleus; zero-frequency resonator (ZFR);

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location :

Singapore

Type :

conf

DOI :

10.1109/ISCSLP.2014.6936660

Filename :

6936660

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=134267