DocumentCode
3582547
Title
Identification of phoneme and its distribution of malay language derived from Friday sermon transcripts
Author
Asyafie, Muhammad Aasim ; Harun, Mokhtar ; Shapiai, Mohd Ibrahim ; Khalid, Puspa Inayat
Author_Institution
Fac. of Electr. Eng., Univ. Teknol. Malaysia, Johor Bahru, Malaysia
fYear
2014
Firstpage
1
Lastpage
6
Abstract
Lack of text data is one of the main issues encountered by Malay speech researchers. Currently, there are few established Malay text corpora to aid in their research. Text corpora are essential due to its ability to provide empirical data for researchers in the field of linguistics and are useful to construct word lists for speech intelligibility test, speech analysis across genders and automatic speech recognition. The text corpora also need to mimic the natural phoneme of the language it represents. To accomplish this, we need to know the phonetic distribution of the language. The purpose of this research is to devise a phoneme distribution for the Malay language based on the transcripts obtained from fifty two Friday sermons. The Friday sermon transcripts were obtained through the official government website and then standardized by removing images and foreign letters; expanding acronyms and short forms; converting numbers and symbols to appropriate Malay words. The transcripts were then phonetically transcribed by first identifying the language rules and wrote a program based on those rules. The program was written using Personal Home Page (PHP) and the data were then stored into MySQL (Sequential Query Language). The data were then retrieved and compared to the Malay words used in news broadcast. In conclusion, the Malay used in Friday sermon and news broadcast differs in the usage of the phonemes /a/, /e/, /o/, /d/, /p/, /t∫/, /n/, /l/, /h/ and /r/.
Keywords
SQL; Web sites; computational linguistics; speech processing; speech recognition; text analysis; Friday sermon transcripts; Malay text corpora; Malay words; MySQL; PHP; acronyms; automatic speech recognition; data retrieval; data storage; empirical data; foreign letter removal; genders; image letter removal; language rules; linguistics field; natural phoneme; news broadcast; official government Web site; personal home page; phoneme distribution; phoneme identification; phoneme usage; phonetic distribution; phonetic transcription; sequential query language; short forms; speech analysis; speech intelligibility test; text data; word lists; Companies; Context; Correlation; Databases; Speech; Spreadsheet programs; Terminology; Bahasa Melayu; Speech; speech clariy; speech intelligibility;
fLanguage
English
Publisher
ieee
Conference_Titel
Research and Development (SCOReD), 2014 IEEE Student Conference on
Print_ISBN
978-1-4799-6427-7
Type
conf
DOI
10.1109/SCORED.2014.7072964
Filename
7072964
Link To Document