DocumentCode :
1696750
Title :
Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection
Author :
Haipeng Wang ; Tan Lee ; Cheung-Chi Leung ; Bin Ma ; Haizhou Li
Author_Institution :
Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Hong Kong, China
fYear :
2013
Firstpage :
8545
Lastpage :
8549
Abstract :
Recently the posteriorgram-based template matching framework has been successfully applied to query-by-example spoken term detection tasks for low-resource languages. This framework employs a tokenizer to derive posteriorgrams, and applies dynamic time warping (DTW) to the posteriorgrams to locate the possible occurrences of a query term. Based on this framework, we propose to improve the detection performance by using multiple tokenizers with DTW distance matrix combination. The proposed approach uses multiple tokenizers in parallel as the front-end to generate different posteriorgram representations, and combines the distance matrices of the different posteriorgrams into a single matrix. DTW detection is then applied to the combined distance matrix. Lastly score post-processing techniques including pseudo-relevance feedback and score normalization are used for further improvement. Experiments were conducted on the spoken web search datasets of MediaEval 2011 and MediaEval 2012. Experimental results show that combining multiple tokenizers significantly outperforms the best single tokenizer, and that the DTW matrix combination method consistently outperforms the score combination method when more than three tokenizers are involved. Score post-processing techniques show further gains on top of using multiple tokenizers.
Keywords :
audio databases; feedback; query processing; signal representation; speech recognition; DTW detection; DTW distance matrix combination; MediaEval 2011; MediaEval 2012; detection performance; dynamic time warping; parallel tokenizers; posteriorgram representations; posteriorgram-based template matching; pseudorelevance feedback; query-by-example spoken term detection; score normalization; score post-processing; spoken web search datasets; Acoustics; Educational institutions; Matrix converters; Robustness; Speech; Training; Vectors; DTW matrix combination; pseudo-relevance feedback; query-by-example spoken term detection; tandem tokenizer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639333
Filename :
6639333
Link To Document :
بازگشت