Statistical syllables selection approach for the preparation of Punjabi speech database

Author

Singh, Parminder ; Lehal, Gurpreet Singh

Author_Institution

Dept. of Comput. Sci. & Eng., Nanak Dev Eng. Coll., India

fYear

2010

fDate

8-11 Nov. 2010

Firstpage

1

Lastpage

4

Abstract

This paper discusses the results of the statistical analysis of Punjabi syllables over a large Punjabi corpus. Syllables have been reported as good choice of speech unit for speech database of many languages. For this work also, syllables have been selected as the speech unit for the development of the Punjabi speech database. For minimizing the database size, efforts have been made for the selection of the minimal set of syllables covering almost whole Punjabi word set. For this all Punjabi syllables have been statistically analyzed on the Punjabi corpus having more than 104 million words. Interesting and very important results have been obtained from this analysis those helps to select a relatively smaller syllable set (about first ten thousand syllables (0.86% of total syllables)) of most frequently occurring syllables having cumulative frequency of occurrence (FOO) less than 99.81%, out of 1156740 total available syllables. Also to improve the efficiency of the text-to-speech (TTS) system; interesting facts about Punjabi syllables have been obtained based on their FOO at the three (starting, middle and end) positions in the words. indented.

Keywords

database management systems; speech synthesis; statistical analysis; FOO; Punjabi corpus; Punjabi speech database; Punjabi word set; TTS; database size; frequency of occurrence; speech unit; statistical analysis; statistical syllables selection approach; text-to-speech;

fLanguage

English

Publisher

ieee

Conference_Titel

Internet Technology and Secured Transactions (ICITST), 2010 International Conference for

Conference_Location

London

Print_ISBN

978-1-4244-8862-9

Electronic_ISBN

978-0-9564263-6-9

Type

conf

Filename

5678557