مرکز منطقه ای اطلاع رساني علوم و فناوري - Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection

DocumentCode :

1696750

Title :

Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection

Author :

Haipeng Wang ; Tan Lee ; Cheung-Chi Leung ; Bin Ma ; Haizhou Li

Author_Institution :

Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Hong Kong, China

fYear :

2013

Firstpage :

8545

Lastpage :

8549

Abstract :

Recently the posteriorgram-based template matching framework has been successfully applied to query-by-example spoken term detection tasks for low-resource languages. This framework employs a tokenizer to derive posteriorgrams, and applies dynamic time warping (DTW) to the posteriorgrams to locate the possible occurrences of a query term. Based on this framework, we propose to improve the detection performance by using multiple tokenizers with DTW distance matrix combination. The proposed approach uses multiple tokenizers in parallel as the front-end to generate different posteriorgram representations, and combines the distance matrices of the different posteriorgrams into a single matrix. DTW detection is then applied to the combined distance matrix. Lastly score post-processing techniques including pseudo-relevance feedback and score normalization are used for further improvement. Experiments were conducted on the spoken web search datasets of MediaEval 2011 and MediaEval 2012. Experimental results show that combining multiple tokenizers significantly outperforms the best single tokenizer, and that the DTW matrix combination method consistently outperforms the score combination method when more than three tokenizers are involved. Score post-processing techniques show further gains on top of using multiple tokenizers.

Keywords :

audio databases; feedback; query processing; signal representation; speech recognition; DTW detection; DTW distance matrix combination; MediaEval 2011; MediaEval 2012; detection performance; dynamic time warping; parallel tokenizers; posteriorgram representations; posteriorgram-based template matching; pseudorelevance feedback; query-by-example spoken term detection; score normalization; score post-processing; spoken web search datasets; Acoustics; Educational institutions; Matrix converters; Robustness; Speech; Training; Vectors; DTW matrix combination; pseudo-relevance feedback; query-by-example spoken term detection; tandem tokenizer;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639333

Filename :

6639333

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1696750