Title :
Acoustic modeling for under-resourced languages: A role in Vietnamese soccer video retrieval
Author :
Pham, Nhut M. ; Vu, Quan H.
Author_Institution :
Artificial Intell. Lab., Univ. of Sci., Ho Chi Minh City, Vietnam
Abstract :
Insufficient training data poses a great challenge to acoustic modeling in automatic speech recognition. The problem becomes more severe when presented in the context of under-resourced languages and several specific domains which lack attention from research. This paper explores the role of under-resourced acoustic models in speech-based soccer event retrieval. An event is defined as the spatiotemporal entity interesting to users, which is remarked by the announcer´s spoken words. By mining out spoken information from the video, soccer events are detected using a speech recognition system. To resolve the issue of limited training data, subspace Gaussian mixture models are employed. Experimental evaluations are conducted on the first round of World Cup 2010 and the Vietnamese AFF Suzuki-cup 2008 databases. In the best case, transcription performance reaches 74.3% accuracy rate, and an average event detection rate of 60.62% can be obtained.
Keywords :
Gaussian processes; mixture models; speech recognition; video retrieval; Gaussian mixture models; Vietnamese AFF Suzuki-cup 2008 databases; Vietnamese soccer video retrieval; World Cup 2010; acoustic modeling; automatic speech recognition; spatiotemporal entity; speech recognition system; speech-based soccer event retrieval; under-resourced acoustic models; under-resourced languages; Acoustics; Databases; Gaussian mixture model; Hidden Markov models; Speech; Speech recognition; Training data; acoustic modeling; event detection; soccer video; speech recognition; under-resourced language;
Conference_Titel :
Advanced Technologies for Communications (ATC), 2013 International Conference on
Conference_Location :
Ho Chi Minh City
Print_ISBN :
978-1-4799-1086-1
DOI :
10.1109/ATC.2013.6698195