مرکز منطقه ای اطلاع رساني علوم و فناوري - An acoustic segment modeling approach to query-by-example spoken term detection

DocumentCode :

3167498

Title :

An acoustic segment modeling approach to query-by-example spoken term detection

Author :

Wang, Haipeng ; Leung, Cheung-Chi ; Lee, Tan ; Ma, Bin ; Li, Haizhou

Author_Institution :

Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Hong Kong, China

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

5157

Lastpage :

5160

Abstract :

The framework of posteriorgram-based template matching has been shown to be successful for query-by-example spoken term detection (STD). This framework employs a tokenizer to convert query examples and test utterances into frame-level posteriorgrams, and applies dynamic time warping to match the query posteriorgrams with test posteriorgrams to locate possible occurrences of the query term. It is not trivial to design a reliable tokenizer due to heterogeneous test conditions and the limitation of training resources. This paper presents a study of using acoustic segment models (ASMs) as the tokenizer. ASMs can be obtained following an unsupervised iterative procedure without any training transcriptions. The STD performance of the ASM tokenizer is evaluated on Fisher Corpus with comparison to three alternative tokenizers. Experimental results show that the ASM tokenizer outperforms a conventional GMM tokenizer and a language-mismatched phoneme recognizer. In addition, the performance is significantly improved by applying unsupervised speaker normalization techniques.

Keywords :

iterative methods; query processing; speech recognition; unsupervised learning; ASM tokenizer; GMM tokenizer; STD; acoustic segment modeling approach; acoustic segment models; dynamic time warping; fisher corpus; frame-level posteriorgrams; language-mismatched phoneme recognizer; posteriorgram-based template matching framework; query-by-example spoken term detection; test utterances; unsupervised iterative procedure; unsupervised speaker normalization techniques; Acoustics; Hidden Markov models; Measurement; Speech; Speech recognition; Training; Training data; Spoken term detection; acoustic segment model; posteriorgram-based template matching; query-by-example;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6289081

Filename :

6289081

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3167498