DocumentCode
2964889
Title
Evaluation of smoothing techniques for language modeling in automatic filipino speech recognition
Author
Ang, F.M. ; Ancheta, J.C.M.C. ; Francia, K.M.F. ; Chua, K.G.
Author_Institution
Digital Signal Process. Lab., Univ. of the Philippines - Diliman, Quezon City, Philippines
fYear
2012
fDate
19-22 Nov. 2012
Firstpage
1
Lastpage
5
Abstract
It is widely known that smoothing techniques are essential for n-gram-based statistical language modeling, especially in large vocabulary continuous speech recognition (LVCSR) tasks. The goal in this paper is to investigate several smoothing algorithms for n-gram models in Filipino LVCSR. The automatic speech recognition system was developed using the Janus Speech Recognition Toolkit (JRTk) of Carnegie Mellon University and Karlsruhe Institute of Technology. The language models were generated using Stanford´s language modeling toolkit, SRILM. The data consisted of approximately 60 hours of transcribed recordings of Filipino speech from several domains spoken by 156 speakers. A total of 24 systems employing different language models were fine-tuned and tested for improved performance at a base metric. An instance of the Kneser-Ney algorithm with modified-at-end counts applied to an n-gram of order 5 registered the highest word recognition accuracy at 80.9% and 81.3% for the development and evaluation tests, respectively.
Keywords
natural language processing; smoothing methods; speech recognition; Janus speech recognition toolkit; Kneser-Ney algorithm; Stanford language modeling toolkit; automatic Filipino speech recognition; large vocabulary continuous speech recognition task; n gram model; smoothing algorithm; smoothing technique; statistical language modeling; transcribed recordings; word recognition accuracy; Irrigation; Filipino speech recognition; JRTk; LVCSR; language modeling;
fLanguage
English
Publisher
ieee
Conference_Titel
TENCON 2012 - 2012 IEEE Region 10 Conference
Conference_Location
Cebu
ISSN
2159-3442
Print_ISBN
978-1-4673-4823-2
Electronic_ISBN
2159-3442
Type
conf
DOI
10.1109/TENCON.2012.6412249
Filename
6412249
Link To Document