Title :
Double-layer neighborhood graph based similarity search for fast query-by-example spoken term detection
Author :
Aoyama, Kazuo ; Ogawa, Atsunori ; Hattori, Takashi ; Hori, Takaaki
Author_Institution :
NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
Abstract :
This paper presents a novel double-layer neighborhood graph index for acceleration of similarity search that accomplishes fast querybyexample spoken term detection (STD). When a query segment is given, our proposed STD method finds similar segments to the query from an utterance data set by efficient similarity search that traverses the double-layer neighborhood graph (DLG) with a low computational cost. The segment is a sequence of Gaussian mixture model posteriorgram frames and corresponds to a vertex in the DLG. A dissimilarity between vertices is measured by dynamic time warping. The DLG consists of two distinct degree-reduced k-nearest neighbor graphs in a base and an upper layer. The base layer´s graph has all the vertices in the data set while the upper layer´s graph includes only representatives extracted from the vertices in the base layer. By way of analogy, search in the DLG resembles driving on general roads and express highways appropriately for travel-time saving. Experimental results on the MIT lecture corpus demonstrate that the proposed method achieves CPU time reduction by 40% and more than 60% compared to the most recent method and the ordinary graphbased method, keeping almost the same precision.
Keywords :
Gaussian processes; graph theory; mixture models; query processing; set theory; speech recognition; CPU time reduction; DLG; Gaussian mixture model posteriorgram frames; MIT lecture corpus; STD method; base layer; degree-reduced k-nearest neighbor graphs; dissimilarity measurement; double-layer neighborhood graph based similarity search; double-layer neighborhood graph index; dynamic time warping; fast query-by-example spoken term detection; low computational cost; query segment; travel-time saving; upper layer; utterance data set; Indexes; Dynamic time warping; Neighborhood graph; Query-by-example search; Search index; Spoken term detection;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178966