Corpus design and development of an annotated speech database for Punjabi

Author

Shweta Bansal;Shambhu Sharan;S.S. Agrawal

Author_Institution

KUT College of Engineering, Gurgaon, India

fYear

2015

Firstpage

32

Lastpage

37

Abstract

Punjabi is an important Indo-Aryan languages spoken in India and in some other countries especially Pakistan. It is a tonal language and its phonetic and phonological aspects have not been studied very much. The present paper reports development of phonemically annotated speech database of Malwai dialect of Punjabi. A phonetically rich text database of 1500 words and 300 sentences from a corpus of about 300,000 words was created. These were recorded by 25 male and 25 female speaker format with sampling rate of 16 kHz and 16 bit. The recordings were made in the native places of speakers possessing the original version the Malwai dialect of Punjabi. The recorded data was segmented and labeled phonemically to get the phonemic and sub-phonemic elements of each phoneme and the tonemes of Punjabi language. The annotated database can be useful for phonetic studies and to develop Punjabi speech synthesis system.

Keywords

"Databases","Speech","Frequency modulation","Dentistry"

Publisher

ieee

Conference_Titel

Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015 International Conference

Type

conf

DOI

10.1109/ICSDA.2015.7357860

Filename

7357860