Title :
Toward unsupervised model-based spoken term detection with spoken queries without annotated data
Author :
Chun-an Chan ; Cheng-Tao Chung ; Yu-Hsin Kuo ; Lin-Shan Lee
Author_Institution :
Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
Abstract :
We present a two-stage model-based approach for unsupervised query-by-example spoken term detection (STD) without any annotated data. Compared to the prevailing DTW approaches for the unsupervised STD task, HMMs used by model-based approaches can better capture the signal distributions and time trajectories of speech with a more global view of the spoken archive; matching with model states also significantly reduces the computational load. The utterances in the spoken archive are first offline decoded into acoustic patterns automatically discovered in an unsupervised way from the spoken archive. In the first stage, we propose a document state matching (DSM) approach, where query frames are matched to the HMM state sequences for the spoken documents. In this process, a novel duration-constrained Viterbi (DC-Vite) algorithm is proposed to avoid unrealistic speaking rate distortion. In the second stage, pseudo relevant/irrelevant examples retrieved from the first stage are respectively used to construct query/anti-query HMMs. Each spoken term hypothesis is then rescored with the likelihood ratio to these two HMMs. Experimental results show an absolute 11.8% of mean average precision improvement with a more than 50% reduction in computation time compared to the segmental DTW approach on a Mandarin broadcast news corpus.
Keywords :
document handling; hidden Markov models; natural language processing; query processing; speech processing; unsupervised learning; DC-Vite algorithm; DSM approach; HMM state sequences; HMMs; Mandarin broadcast news corpus; STD; acoustic patterns; annotated data; computational load; document state matching; duration-constrained Viterbi; query frames; query-anti-query HMM; signal distributions; speaking rate distortion; speech time trajectories; spoken archive; spoken documents; spoken queries; unsupervised model based spoken term detection; unsupervised query-by-example spoken term detection; Acoustics; Computational modeling; Hidden Markov models; Speech; Speech recognition; Training; Viterbi algorithm; Unsupervised spoken term detection; query-by-example; speech pattern discovery; zero-resource;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639334