DocumentCode :
2392
Title :
Acoustic Segment Modeling with Spectral Clustering Methods
Author :
Haipeng Wang ; Tan Lee ; Cheung-Chi Leung ; Bin Ma ; Haizhou Li
Author_Institution :
Dept. of Electr. Eng., Chinese Univ. of Hong Kong, Hong Kong, China
Volume :
23
Issue :
2
fYear :
2015
fDate :
Feb. 2015
Firstpage :
264
Lastpage :
277
Abstract :
This paper presents a study of spectral clustering-based approaches to acoustic segment modeling (ASM). ASM aims at finding the underlying phoneme-like speech units and building the corresponding acoustic models in the unsupervised setting, where no prior linguistic knowledge and manual transcriptions are available. A typical ASM process involves three stages, namely initial segmentation, segment labeling, and iterative modeling. This work focuses on the improvement of segment labeling. Specifically, we use posterior features as the segment representations, and apply spectral clustering algorithms on the posterior representations. We propose a Gaussian component clustering (GCC) approach and a segment clustering (SC) approach. GCC applies spectral clustering on a set of Gaussian components, and SC applies spectral clustering on a large number of speech segments. Moreover, to exploit the complementary information of different posterior representations, a multiview segment clustering (MSC) approach is proposed. MSC simultaneously utilizes multiple posterior representations to cluster speech segments. To address the computational problem of spectral clustering in dealing with large numbers of speech segments, we use inner product similarity graph and make reformulations to avoid the explicit computation of the affinity matrix and Laplacian matrix. We carried out two sets of experiments for evaluation. First, we evaluated the ASM accuracy on the OGI-MTS dataset, and it was shown that our approach could yield 18.7% relative purity improvement and 15.1% relative NMI improvement compared with the baseline approach. Second, we examined the performances of our approaches in the real application of zero-resource query-by-example spoken term detection on SWS2012 dataset, and it was shown that our approaches could provide consistent improvement on four different testing scenarios with three evaluation metrics.
Keywords :
Gaussian processes; acoustic signal processing; graph theory; pattern clustering; signal representation; speech processing; ASM process; GCC approach; Gaussian component clustering approach; Laplacian matrix; MSC approach; SWS2012 dataset; acoustic segment modeling; affinity matrix; initial segmentation; inner product similarity graph; iterative modeling; multiple posterior representations; multiview segment clustering approach; phoneme-like speech units; posterior features; segment labeling; segment representations; spectral clustering methods; speech segments; zero-resource query-by-example spoken term detection; Acoustics; Hidden Markov models; Labeling; Mathematical model; Speech; Speech processing; Vectors; Acoustic segment modeling; multiview segment clustering; sub-word unit discovery; unsupervised training; zero-resource query-by-example spoken term detection;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2014.2387382
Filename :
7001242
Link To Document :
بازگشت